Neary Martha, Fulton Emily, Rogers Victoria, Wilson Julia, Griffiths Zoe, Chuttani Ram, Sacher Paul M
Vir Health Ltd., London, United Kingdom.
Allurion Technologies, Natick, MA, United States.
Front Digit Health. 2025 Jun 18;7:1460236. doi: 10.3389/fdgth.2025.1460236. eCollection 2025.
Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.
基于机器学习的对话式和生成式人工智能(GenAI)的发展为先进的对话代理创造了机会,以增强医疗保健的各个方面。虽然人工智能不能替代专业护理,但它为扩大规模、提高成本效益以及实现患者护理诸多方面的自动化提供了机会。然而,为了实现这些机会并安全地提供人工智能支持,必须根据一套商定的性能标准,持续监测和评估患者与人工智能之间的互动。本文提出了这样一套标准,该标准是为评估与人工智能健康教练的互动而制定的,该教练旨在支持接受肥胖治疗的患者,并已在活跃的患者用户群体中部署。评估框架通过开发、测试、完善、培训、审查和监督的迭代过程不断演变。该框架在单个消息和整体对话层面进行评估,在四个领域将互动评定为可接受或不可接受:保真度、准确性、安全性和语气(FAST),每个领域都有一系列需要考虑的问题。建立了确保评估质量一致的流程,并根据临床风险定义了向医疗保健提供者升级的额外患者安全程序。该框架可由经过培训的评估人员实施,并提供一种方法,通过该方法,部署人工智能以支持患者的医疗机构可以审查质量和安全性,从而确保安全采用。