• Apr 15

When Wearables Meet AI Responsibly

*Emerging trends in neuroscience* Key Points: • Lee and Calvo’s 2026 paper argues that the main ethical risk in sensor-fused health agents is not only whether the model is accurate, but how biometric signals are translated into language that users experience. • The paper proposes five design dimensions for safer health conversations: biometric disclosure, monitoring temporality, interpretation framing, AI stance, and contestability. • For biofeedback and neurofeedback professionals, the article is a timely reminder that physiological data can inform care, but should never be delivered with more certainty than the signal truly warrants.


This new emerging research with novel insights tackles a question that is becoming harder to ignore: what happens when wearable sensor data and large language models begin speaking to users as though the body were delivering plain-English facts? In Front-End Ethics for Sensor-Fused Health Conversational Agents: An Ethical Design Space for Biometrics, Lee and Calvo (2026) argue that the central ethical problem is no longer confined to the back end of AI systems—the algorithms, classifiers, or fusion models—but increasingly lives at the front end, where physiological signals are converted into persuasive language.

That shift matters enormously for our field. Biofeedback is a process that helps individuals learn to regulate physiological activity by providing real-time information about bodily signals such as heart rate, breathing, skin conductance, or muscle tension. Neurofeedback is a form of biofeedback that specifically uses measures of brain activity, most commonly EEG, to support self-regulation of neural states. In both cases, the intervention depends not just on signal acquisition, but on interpretation. A number on a screen is rarely neutral once it becomes a message.

Lee and Calvo focus on sensor-fused health conversational agents—systems that combine continuous data streams from wearables or other sensors with LLM-generated dialogue. Their key warning is elegant and important: sensor data can create an illusion of objectivity. Once an LLM translates a fluctuating physiological pattern into phrases like “you are stressed” or “your recovery is poor,” uncertainty can harden into a diagnosis-like claim. For clinicians, researchers, and developers working in biofeedback or neurofeedback, this paper offers a useful ethical map for thinking about how feedback should be framed, timed, contested, and safely understood.


Methods

This paper is not an experimental trial, clinical outcome study, or technical benchmark. Instead, it is a conceptual and design-oriented analysis aimed at clarifying the ethical stakes of sensor-fused health conversational agents. That matters for interpretation: the authors are not testing whether a specific wearable chatbot improves outcomes, but rather building a framework for how such systems should communicate under conditions of uncertainty.

The paper first sketches the system architecture typical of sensor-fused health agents. As presented in the figure, the workflow moves through four broad stages: sensing, fusion, translation, and conversation. Sensing includes physiological and behavioural inputs such as heart rate, heart rate variability, sleep, respiration, temperature, and activity logs. Fusion refers to the modeling layer, where these time-series inputs are combined into inferred states such as stress, load, recovery, sleep stage, or anomaly detection. Translation is the pivotal step: the system turns latent model outputs into summaries, labels, narratives, and coaching hypotheses. Finally, conversation is the delivery layer, where these translations reach the user through question-answer exchanges, supportive messages, or alerts.

The authors then distinguish between two interaction contexts. In pull interactions, the user initiates the exchange and is generally more prepared to reflect on their data. In push interactions, the system generates an alert without being asked, often when the person may be busy, anxious, socially exposed, or physiologically activated. This distinction drives much of the paper’s ethical argument, because the same message may be tolerable in a reflective pull context and destabilizing in a push context.

From there, Lee and Calvo propose a five-dimensional ethical design space. The dimensions are: data disclosure, monitoring temporality, interpretation framing, AI stance, and contestability. Rather than treating these as fixed design choices, the paper frames them as adjustable levers that should shift according to risk, vulnerability, context, and user state. The authors also propose adaptive guardrails, including “adaptive disclosure” and brief verification-in-the-loop routines, as practical front-end safeguards.


Results

Because this is a conceptual workshop paper, it does not report participant outcomes, effect sizes, randomized comparisons, or longitudinal behavioural data. There are no symptom-change metrics, adherence results, or user-experience statistics. That absence is important: the paper advances a design proposal, not empirical proof that a given interface reduces harm.

Its main “results,” then, are best understood as conceptual outputs.

First, the paper identifies a core ethical problem in sensor-fused health agents: the illusion of objectivity. Physiological signals often feel authoritative to users, even when they are noisy, ambiguous, or context-sensitive. Once an LLM converts those signals into natural-language claims, especially in a confident tone, users may treat an uncertain inference as settled fact. The authors argue that this can amplify familiar LLM risks—hallucination, overconfidence, emotional persuasion, and dependency—while adding specifically health-related risks such as anxiety amplification, self-misinterpretation, and nocebo-like responses.

Second, the five-dimensional design space provides a useful structure for anticipating where harm may emerge. The dimensions move from implicit to explicit disclosure, on-demand to continuous monitoring, reflective to directive framing, humble to authoritative AI stance, and fixed to correctable interpretations. Each dimension is paired with an ethical tension: transparency versus anxiety, intervention versus dependence, autonomy versus cognitive load, safety versus overtrust, and helplessness versus empowerment.

Third, the paper proposes adaptive front-end guardrails. The most clinically relevant suggestion is that systems should shift toward safer communication styles when users may be vulnerable—for example, reducing raw-data disclosure, softening interpretive certainty, and favouring reflective prompts over declarative labels. The authors also recommend short verification routines, such as translating a signal first as “heightened arousal” and then asking whether recent exercise, caffeine, excitement, or anxiety might explain the pattern before finalizing interpretation.

In other words, the paper’s contribution is a practical ethical vocabulary for designing user-facing biometric translation under uncertainty.


Discussion

This paper is especially interesting for biofeedback and neurofeedback because it reframes a problem we already know well: signals do not speak for themselves. Clinically, physiological and EEG data are only useful when they are interpreted in context, with appropriate humility, and with enough collaboration that the person receiving feedback remains an active meaning-maker rather than a passive recipient of machine authority.

What the paper shows clearly is that when biometric data are routed through a conversational agent, the danger is not merely in a false reading. The deeper risk is rhetorical inflation. A probabilistic pattern becomes a sentence, the sentence sounds authoritative, and the authority changes behaviour. A wearable may detect elevated heart rate, lower HRV, fragmented sleep, or unusual activity. None of those states maps cleanly onto a single psychological meaning. Yet a system that declares “you are stressed” may provoke worry, increase vigilance, intensify bodily arousal, and create exactly the kind of feedback loop the system claims to be identifying. That is a remarkably biofeedback-relevant insight.

For clients, the practical implication is simple but important: physiological feedback can support self-awareness, but it can also over-shape self-narrative when delivered too forcefully. Some users benefit from numbers and trend data; others become more anxious, obsessive, or overly dependent on external interpretation. This paper therefore supports a clinically familiar principle: more data is not always more regulation. Sometimes less explicit feedback, delivered at the right time and with the right framing, is the more therapeutic choice.

For referring professionals and multidisciplinary teams, the paper is also a useful caution. As AI-powered health platforms become more common, clinicians may increasingly encounter patients who arrive with sensor-derived interpretations already woven into their self-understanding. “My ring says I’m not recovering.” “My app says I’m chronically stressed.” “The system detected burnout.” These statements may carry the emotional weight of diagnosis even when the underlying inference is weak. The paper’s emphasis on contestability is particularly valuable here. Users should be able to correct the system, reject labels, and escalate ambiguous or high-risk concerns to a human professional.

For neurofeedback professionals, the relevance is immediate. EEG neurofeedback already involves translating complex, probabilistic signals into user-facing targets and expectations. That translation is ideally collaborative, state-dependent, and shaped gradually through successive approximations rather than blunt declarations about what the brain “is.” Lee and Calvo’s framework extends that logic into the AI era. A conversational layer attached to EEG, HRV, skin conductance, respiration, or sleep tracking should not pretend to know more than the measurement can justify. It should support self-regulation, not replace clinical reasoning.

The most useful interpretive contribution of the paper is its insistence that front-end design is not a cosmetic matter. In health technology, wording is intervention. Timing is intervention. Disclosure level is intervention. Whether a message is push or pull is intervention. That idea aligns closely with what many experienced biofeedback clinicians discover in practice: the same signal can calm, confuse, motivate, shame, or dysregulate depending on how it is introduced and scaffolded.

At the same time, the paper has real limits. It is a concise conceptual piece rather than a full empirical study. It does not test whether adaptive disclosure reduces anxiety, whether reflective framing improves adherence, or whether users with different symptom profiles need distinct guardrail settings. It also does not address the full complexity of clinical triage, reimbursement, implementation burden, or how such systems should behave in populations with OCD, panic, illness anxiety, autism, trauma, or severe depression—groups in whom feedback language may land very differently. So while the framework is strong, it remains a framework.

Even with that limitation, the paper lands an important point: sensor-fused systems should be designed not merely to detect patterns, but to preserve autonomy while handling uncertainty. That is a deeply compatible goal with good biofeedback and neurofeedback practice.


Brendan’s perspective

This paper lands right on the fault line of where our field is heading. When wearables meet AI, biofeedback is no longer confined to the clinic, the training room, or the carefully supervised session. Signals that used to live inside clinical software are now being collected continuously by rings, watches, chest straps, headbands, and phones, then routed into systems that do something very powerful: they narrate the body back to the person in plain language.

That is exciting. It is also where we need to grow up as a field.

Because this integration is not coming someday. It is already here. Consumer platforms are increasingly translating heart rate, HRV, sleep stages, skin temperature, activity load, and in some cases EEG-like proxies into interpretations, recommendations, and pseudo-clinical summaries. Add conversational AI, and suddenly the person is no longer just seeing a graph. They are being told a story about themselves: that they are stressed, under-recovered, dysregulated, resilient, inflamed, cognitively overloaded, perhaps even on the verge of burnout. And once a system tells that story fluently enough, many users will treat it as truth.

This is exactly why I think biofeedback professionals, neurofeedback clinicians, researchers, and developers need to help guide this integration rather than stand on the sidelines and complain about it after the fact. If the future is sensor-fused, AI-mediated self-regulation, then our responsibility is to bring the field’s hard-earned wisdom into that future: respect the signal, respect context, respect uncertainty, and never confuse a metric with a person.

There is an old principle that absolutely belongs here: garbage in, garbage out. In biofeedback and neurofeedback, we have always known that the elegance of the interface means very little if the underlying signal is poor. If the sensor is noisy, if the electrode contact is unstable, if the sampling is inadequate, if motion artifact is mistaken for physiology, if the transformation pipeline is opaque, or if the metric itself is only weakly related to the construct being described, then the downstream language will not become more valid just because it sounds polished. It will simply become more persuasive.

And persuasive nonsense is more dangerous than obvious nonsense.

That point applies just as much to EEG neurofeedback as it does to wearables. A conversational system layered on top of EEG should not speak as though a single threshold crossing at Cz, Fz, C3, C4, Pz, or wherever we happen to be training provides direct access to a person’s mental state in all its richness. Changes in SMR, alpha, theta/beta ratios, beta inhibition, alpha-theta transitions, coherence estimates, or asymmetry metrics can be clinically useful within a carefully bounded framework. But these are not self-interpreting truths. They are signal features embedded in an assessment model, a training model, and a relational context. The clinician’s job is not merely to reward frequencies. It is to shape learning without overstating what the signal means.

So when I think about AI entering our space, I do not want systems that sound more certain than a careful clinician would sound. I want systems that inherit the maturity of good practice. That means several things.

First, signal quality has to come before storytelling. If a platform cannot tell the user what sensor is being used, where it is placed, what is actually being measured, how artifacts are handled, what transformation is applied, and how a summary metric is derived, then the interpretation should remain correspondingly modest. Black-box health narration is not maturity. It is marketing.

Second, clinical-grade applications should be built on transparent physiological logic. If you are going to tell someone that their arousal is elevated, I want to know whether that claim rests on heart rate alone, HRV features, electrodermal activity, respiration, motion, time-of-day effects, or some fusion model that has been validated in a population resembling the user. If you are going to interpret EEG, I want clarity on montage, referencing, artifact rejection, band definitions, reward and inhibit criteria, state dependence, and whether the training goal is trait change, state regulation, skill acquisition, or all three. In other words, the system should show its work—at least enough that clinicians and informed users can understand the chain from raw signal to advice.

Third, we need to preserve the central insight of biofeedback: the goal is self-regulation, not algorithmic dependence. Good biofeedback makes the person more attuned, more flexible, and less reliant on external prompts over time. Poorly designed AI could push the opposite direction. It could train people to outsource interoception to dashboards and to trust the machine’s description of their body more than their own felt experience. That would be a perverse outcome for a field supposedly devoted to agency.

This is especially relevant for anxious, obsessive, perfectionistic, or high-monitoring clients. Some individuals will benefit from seeing trend data and gentle reflective prompts. Others will become increasingly preoccupied with recovery scores, readiness flags, or physiological deviations that may be trivial or ambiguous. In those populations, the ethical question is not “How much can we measure?” but “What presentation of this measurement helps regulation rather than rumination?” Sometimes the most clinically intelligent interface is not the most detailed one. It is the one that knows when not to escalate meaning.

I also think our field has something important to contribute to AI design philosophically. Biofeedback and neurofeedback have always sat at the intersection of signal, learning, and relationship. The feedback itself is only one part of the intervention. Equally important are shaping, state-dependent training, contextualization, collaborative interpretation, and transfer into daily life. Those principles should inform the next generation of AI health systems. A mature system would not simply say, “Your biometrics indicate stress.” It might say, “There are signs of elevated arousal. Would you like to check whether this fits exercise, time pressure, caffeine, pain, excitement, or worry?” That is a very different posture. Same data, much healthier psychology.

So yes, I think this article can absolutely be pitched as a when wearables meet AI piece. But I would add this: the real opportunity is not just integration. It is responsible integration. We need better sensors, better validation, better transparency, better language, and better humility. We need systems that can acknowledge uncertainty without becoming useless, and offer guidance without becoming authoritarian.

If we do this well, AI could genuinely extend the reach of biofeedback—bringing moment-to-moment physiological awareness, personalised coaching, and even bridges into EEG neurofeedback-informed care to people who otherwise might never access it. But that will only happen if we resist the temptation to turn noisy physiology into overconfident narrative. The future of our field should not be built on magical interpretations of mediocre signals. It should be built on careful measurement, transparent transformation, clinically grounded framing, and a relentless commitment to helping people become better regulators of their own minds and bodies.


Conclusion

Lee and Calvo’s paper is brief, but it addresses a big emerging problem with admirable precision. As health technologies become increasingly capable of merging wearables, behavioural logs, and language models, the ethical question is no longer only whether the model can detect a pattern. It is whether the system can translate that pattern in a way that supports reflection without overstating certainty, provoking anxiety, or eroding agency.

That is where this article feels especially relevant to the worlds of biofeedback and neurofeedback. Our field has always lived at the boundary between signal and meaning. We work with real physiology, but we also know that physiology must be interpreted carefully, collaboratively, and in context. Lee and Calvo bring that same principle into the design of sensor-fused conversational agents.

The most constructive takeaway is not anti-technology. It is pro-humility. Better health agents will not be the ones that sound most confident. They will be the ones that communicate uncertainty well, invite correction, adapt their disclosure to user readiness, and preserve the person’s role as an active participant in regulation. That is a promising direction—and one well worth watching.


References

Lee, H., & Calvo, R. A. (2026). Front-end ethics for sensor-fused health conversational agents: An ethical design space for biometrics. In Proceedings of the CHI 2026 Workshop: Ethics at the Front-End: Responsible User-Facing Design for AI Systems.

0 comments

Sign upor login to leave a comment