Biro Joshua M, Handley Jessica L, Mickler James, Reddy Sahithi, Kottamasu Varsha, Ratwani Raj M, Cobb Nathan K
MedStar Health National Center for Human Factors in Healthcare, Washington, DC 20008, United States.
Georgetown University Medical Center, Washington, DC 20007, United States.
J Am Med Inform Assoc. 2025 May 1;32(5):928-931. doi: 10.1093/jamia/ocaf052.
The objective of this work is to demonstrate the value of simulation testing for rapidly evaluating artificial intelligence (AI) products.
Researcher-physician teams simulated the use of 2 Ambient Digital Scribe (ADS) products by reading scripts of outpatient encounters while using both products, yielding a total of 44 draft notes. Time to edit, perceived amount of effort and editing, and errors in the AI-generated draft notes were analyzed.
Ambient Digital Scribe Product A draft notes took significantly longer to edit, had fewer omissions, and more additions and irrelevant or misplaced text errors than ADS Product B. Ambient Digital Scribe Product A was rated as performing better for most encounters.
Artificial intelligence-enabled products are being rapidly developed and implemented into practice, outpacing safety concerns. Simulation testing can efficiently identify safety issues.
Simulation testing is a crucial first step to take when evaluating AI-enabled technologies.
本研究旨在证明模拟测试对于快速评估人工智能(AI)产品的价值。
研究人员-医生团队通过阅读门诊诊疗脚本同时使用两款环境数字抄写员(ADS)产品,模拟其使用过程,共产生44份草稿记录。分析编辑时间、感知到的编辑工作量和错误,以及人工智能生成的草稿记录中的错误。
与ADS产品B相比,环境数字抄写员产品A的草稿记录编辑时间明显更长,遗漏更少,添加和无关或位置错误的文本错误更多。在大多数诊疗中,环境数字抄写员产品A的表现被评为更好。
启用人工智能的产品正在迅速开发并应用于实践,速度超过了安全方面的担忧。模拟测试可以有效地识别安全问题。
模拟测试是评估人工智能技术时至关重要的第一步。