Lukac Paul J, Turner William, Vangala Sitaram, Chin Aaron T, Khalili Joshua, Shih Ya-Chen Tina, Sarkisian Catherine, Cheng Eric M, Mafi John N
Department of Pediatrics, David Geffen School of Medicine; UCLA Health Information Technology, UCLA Health, University of California, Los Angeles, Los Angeles, CA, United States.
Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States.
medRxiv. 2025 Jul 11:2025.07.10.25331333. doi: 10.1101/2025.07.10.25331333.
Ambient artificial intelligence (AI) scribes record patient encounters and generate visit notes almost instantaneously, representing a promising solution to documentation burden and associated physician burnout. Despite swift and widespread adoption of AI scribes, their impacts have not been examined in randomized-clinical trials.
To test the effectiveness of two AI scribes in reducing time spent writing notes and associated burnout in a randomized-clinical trial.
Parallel three-arm pragmatic randomized-clinical trial where physicians were assigned 1:1:1 via covariate-constrained randomization (balancing on time-in-note, baseline burnout score, and clinic days /week) to either one of two AI scribe applications-Microsoft DAX or Nabla-or a usual-care control group from 11/4/2024-1/3/2025.
A large academic health system in California.
313 outpatient physicians were recruited based on leadership referrals and department-wide emails. 238 participants representing 14 specialties qualified.
Intervention-arm physicians gained access to an AI scribe for two months.
The primary outcome was change from baseline log writing time-in-note. Secondary outcomes measured by surveys included Mini-Z 2.0, 4-item physician task load (TL), and Professional Fulfillment Index-Work Exhaustion (PFI-WE) scores to evaluate aspects of burnout, work environment, and stress, as well as targeted questions addressing safety and accuracy.
DAX was used in 33.5% of 24,696 visits; Nabla was used in 29.5% of 23,653 visits. Nabla users experienced a 9.5% [95% CI:-17.2%,-1.8%] (p=.02) decrease in time-in-note versus the control group and a 7.8% [-15.5%,-0.1%] (p=.05) decrease versus DAX users, while DAX users exhibited no significant change versus control (-1.7% [-9.4%,+5.9%]; p=.66). Total Mini-Z, scaled 10-50 with higher scores indicating improvement, increased with users of any scribe (+2.76 [+1.41,+4.10]; p<.001). Reductions in TL (scale 0-400, TL=-35.8 [-63.7, -7.9]; p=.01) and work exhaustion (scale 0-4, PFI-WE=-0.27 [-0.48, -0.07]; p=.01) were seen with users of any scribe. One Grade 1 (mild) adverse event was reported, while clinically-significant inaccuracies were noted "occasionally" on 5-point Likert questions (DAX 2.7 [2.4-3.0] vs. Nabla 2.8 [2.6-3.0]; p=.68).
Use of Nabla reduced time-in-note, while use of any scribe led to modest improvements in physician burnout, work exhaustion, and task load. Performance was remarkably similar across two distinct vendor platforms, and occasional inaccuracies observed in either scribe require ongoing physician vigilance.
ClinicalTrials.gov Identifier: NCT06792890.
环境人工智能(AI)抄写员记录患者诊疗过程并几乎能即时生成就诊记录,这是解决文档负担及相关医生职业倦怠问题的一个有前景的方案。尽管AI抄写员迅速且广泛地被采用,但其影响尚未在随机临床试验中得到检验。
在一项随机临床试验中测试两种AI抄写员在减少书写记录时间及相关职业倦怠方面的有效性。
平行三臂实用随机临床试验,通过协变量约束随机化(根据记录时间、基线职业倦怠评分和每周门诊天数进行平衡)将医生按1:1:1的比例分配到两种AI抄写员应用程序(微软DAX或Nabla)之一或常规护理对照组,试验时间为2024年11月4日至2025年1月3日。
加利福尼亚州的一个大型学术医疗系统。
基于领导推荐和全部门邮件招募了313名门诊医生。238名代表14个专业的参与者符合条件。
干预组的医生可使用AI抄写员两个月。
主要结局是记录时间相对于基线的变化。通过调查测量的次要结局包括Mini-Z 2.0、4项医生任务负荷(TL)和职业成就感指数-工作倦怠(PFI-WE)评分,以评估职业倦怠、工作环境和压力的各个方面,以及关于安全性和准确性的针对性问题。
在24,696次就诊中,33.5%使用了DAX;在23,653次就诊中,29.5%使用了Nabla。与对照组相比,使用Nabla的用户记录时间减少了9.5%[95%置信区间:-17.2%,-1.8%](p = 0.02),与使用DAX的用户相比减少了7.8%[-15.5%,-0.1%](p = 0.05),而使用DAX的用户与对照组相比无显著变化(-1.7%[-9.4%,+5.9%];p = 0.66)。总的Mini-Z评分,范围为10 - 50,分数越高表明改善越大,使用任何一种抄写员的用户该评分均有所增加(+2.76[+1.41,+4.10];p < 0.001)。使用任何一种抄写员的用户在任务负荷(范围0 - 400,TL = -35.8[-63.7,-7.9];p = 0.01)和工作倦怠(范围0 - 4,PFI-WE = -0.27[-0.48,-0.07];p = 0.01)方面都有所降低。报告了1例1级(轻度)不良事件,在5分制李克特量表问题上“偶尔”会出现具有临床意义的不准确情况(DAX为2.7[2.4 - 3.0],Nabla为2.8[2.6 - 3.0];p = 0.68)。
使用Nabla可减少记录时间,而使用任何一种抄写员都能使医生的职业倦怠、工作倦怠和任务负荷得到适度改善。在两个不同的供应商平台上性能非常相似,并且在任何一种抄写员中偶尔观察到的不准确情况需要医生持续保持警惕。
ClinicalTrials.gov标识符:NCT06792890。