用于在临床实践中监测和评估环境人工智能的实用试验操作新手册。

A Novel Playbook for Pragmatic Trial Operations to Monitor and Evaluate Ambient Artificial Intelligence in Clinical Practice.

作者信息

Afshar Majid, Resnik Felice, Baumann Mary Ryan, Hintzke Josie, Lemmon Kayla, Sullivan Anne Gravel, Shah Tina, Stordalen Anthony, Oberst Michael, Dambach Jason, Mrotek Leigh Ann, Quinn Mariah, Abramson Kirsten, Kleinschmidt Peter, Brazelton Tom, Twedt Heidi, Kunstman David, Wills Graham, Long John, Patterson Brian W, Liao Frank J, Rasmussen Stacy, Burnside Elizabeth, Goswami Cherodeep, Gordon Joel E

机构信息

Institute for Clinical and Translational Research, School of Medicine and Public Health, University of Wisconsin, Madison.

Department of Medicine, School of Medicine and Public Health, University of Wisconsin, Madison.

出版信息

NEJM AI. 2025 Sep;2(9). doi: 10.1056/aidbp2401267. Epub 2025 Aug 28.

DOI:10.1056/aidbp2401267

PMID:40959192

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12435388/

Abstract

BACKGROUND

Ambient artificial intelligence (AI) offers the potential to reduce documentation burden and improve efficiency through clinical note generation. Widespread adoption, however, remains limited due to challenges in electronic health record (EHR) integration, coding compliance, and real-world evaluation. This study introduces a framework and protocols to design, monitor, and deploy ambient AI within routine care.

METHODS

We launched an implementation phase to build technical workflows, establish governance, and inform a pragmatic randomized trial. A bidirectional governance model linked operations and research through multidisciplinary workgroups that incorporated the Systems Engineering Initiative for Patient Safety (SEIPS) framework. Integration into the EHR used Fast Healthcare Interoperability Resources (FHIR), and a real-time dashboard tracked utilization and documentation accuracy. To monitor drift, a difference-in-differences analysis was applied to three process metrics: time in notes, work outside work, and utilization. Audits of , Tenth Revision (ICD-10) compliance were performed using an internally developed large language model (LLM), the validity of which was assessed through correlation with certified professional coders.

RESULTS

Ambient AI utilization, measured as the proportion of eligible clinical notes completed using the system, had a weighted median of 65.4% (interquartile range, 50.6 to 84.0%). Iterative improvement cycles targeted task-specific adoption. A brief workflow issue related to a note template change initially reduced ICD-10 documentation accuracy from 79% (95% confidence interval [CI], 72 to 86%) to 35% (95% CI, 28 to 42%); accuracy returned to baseline after note template redesign and user training. The internally developed LLM coder achieved a strong correlation with professional coders (Pearson's r=0.97). The trial enrolled 66 providers across eight specialties, powered at 90% for the primary outcome of provider well-being.

CONCLUSIONS

We provide a publicly available framework and protocols to help safely implement ambient AI in health care. Innovations include an embedded pragmatic trial design, human factors engineering, compliance-driven feedback loops, and real-time monitoring to support deployment, ensuring fidelity before initiation of the clinical trial. (Funded by the University of Wisconsin Hospital and Clinics and the National Institutes of Health Clinical and Translational Science Award; NIH/ NCATS UL1TR002737; ClinicalTrials.gov number, NCT06517082.).

摘要

背景

环境人工智能（AI）有潜力通过生成临床记录来减轻文档负担并提高效率。然而，由于电子健康记录（EHR）集成、编码合规性和现实世界评估方面的挑战，其广泛应用仍然有限。本研究介绍了一个在常规护理中设计、监测和部署环境AI的框架及方案。

方法

我们启动了一个实施阶段，以构建技术工作流程、建立治理机制，并为一项务实的随机试验提供信息。一个双向治理模型通过多学科工作组将运营和研究联系起来，这些工作组纳入了患者安全系统工程倡议（SEIPS）框架。使用快速医疗保健互操作性资源（FHIR）集成到电子健康记录中，一个实时仪表盘跟踪使用情况和文档准确性。为了监测偏差，对三个过程指标应用了差异分析：记录时间、工作外工作和使用率。使用内部开发的大语言模型（LLM）对国际疾病分类第十次修订版（ICD-10）的合规性进行审核，并通过与认证专业编码员的相关性评估其有效性。

结果

以使用该系统完成的符合条件的临床记录比例衡量，环境AI的使用率加权中位数为65.4%（四分位间距，50.6%至84.0%）。迭代改进周期针对特定任务的采用情况。一个与记录模板更改相关的简短工作流程问题最初将ICD-10文档准确性从79%（95%置信区间[CI]，72%至86%）降至35%（95%CI，28%至42%）；在记录模板重新设计和用户培训后，准确性恢复到基线水平。内部开发的LLM编码员与专业编码员有很强的相关性（Pearson相关系数r=0.97）。该试验招募了来自八个专业的66名提供者，主要结局为提供者幸福感的检验效能为90%。

结论

我们提供了一个公开可用的框架和方案，以帮助在医疗保健中安全实施环境AI。创新包括嵌入式务实试验设计、人因工程、合规驱动的反馈回路和实时监测以支持部署，在启动临床试验前确保保真度。（由威斯康星大学医院和诊所以及美国国立卫生研究院临床与转化科学奖资助；NIH/NCATS UL1TR002737；ClinicalTrials.gov编号，NCT06517082。）