Huang Jonathan, Wittbrodt Matthew T, Teague Caitlin N, Karl Eric, Galal Galal, Thompson Michael, Chapa Ajay, Chiu Ming-Lun, Herynk Bradley, Linchangco Richard, Serhal Ali, Heller J Alex, Abboud Samir F, Etemadi Mozziyar
Department of Radiology, Northwestern University Feinberg School of Medicine, Chicago, Illinois.
Department of Biomedical Engineering, Northwestern University, Evanston, Illinois.
JAMA Netw Open. 2025 Jun 2;8(6):e2513921. doi: 10.1001/jamanetworkopen.2025.13921.
Diagnostic imaging interpretation involves distilling multimodal clinical information into text form, a task well-suited to augmentation by generative artificial intelligence (AI). However, to our knowledge, impacts of AI-based draft radiological reporting remain unstudied in clinical settings.
To prospectively evaluate the association of radiologist use of a workflow-integrated generative model capable of providing draft radiological reports for plain radiographs across a tertiary health care system with documentation efficiency, the clinical accuracy and textual quality of final radiologist reports, and the model's potential for detecting unexpected, clinically significant pneumothorax.
DESIGN, SETTING, AND PARTICIPANTS: This prospective cohort study was conducted from November 15, 2023, to April 24, 2024, at a tertiary care academic health system. The association between use of the generative model and radiologist documentation efficiency was evaluated for radiographs documented with model assistance compared with a baseline set of radiographs without model use, matched by study type (chest or nonchest). Peer review was performed on model-assisted interpretations. Flagging of pneumothorax requiring intervention was performed on radiographs prospectively.
The primary outcomes were association of use of the generative model with radiologist documentation efficiency, assessed by difference in documentation time with and without model use using a linear mixed-effects model; for peer review of model-assisted reports, the difference in Likert-scale ratings using a cumulative-link mixed model; and for flagging pneumothorax requiring intervention, sensitivity and specificity.
A total of 23 960 radiographs (11 980 each with and without model use) were used to analyze documentation efficiency. Interpretations with model assistance (mean [SE], 159.8 [27.0] seconds) were faster than the baseline set of those without (mean [SE], 189.2 [36.2] seconds) (P = .02), representing a 15.5% documentation efficiency increase. Peer review of 800 studies showed no difference in clinical accuracy (χ2 = 0.68; P = .41) or textual quality (χ2 = 3.62; P = .06) between model-assisted interpretations and nonmodel interpretations. Moreover, the model flagged studies containing a clinically significant, unexpected pneumothorax with a sensitivity of 72.7% and specificity of 99.9% among 97 651 studies screened.
In this prospective cohort study of clinical use of a generative model for draft radiological reporting, model use was associated with improved radiologist documentation efficiency while maintaining clinical quality and demonstrated potential to detect studies containing a pneumothorax requiring immediate intervention. This study suggests the potential for radiologist and generative AI collaboration to improve clinical care delivery.
诊断成像解读涉及将多模态临床信息提炼成文本形式,这一任务非常适合通过生成式人工智能(AI)来辅助。然而,据我们所知,基于AI的放射学报告初稿在临床环境中的影响尚未得到研究。
前瞻性评估放射科医生使用一种工作流程集成的生成模型(该模型能够为三级医疗系统中的普通X光片提供放射学报告初稿)与文档记录效率、最终放射科医生报告的临床准确性和文本质量之间的关联,以及该模型检测意外的、具有临床意义的气胸的潜力。
设计、设置和参与者:这项前瞻性队列研究于2023年11月1日至2024年4月24日在一家三级医疗学术健康系统进行。将使用生成模型与不使用该模型的基线X光片组(按研究类型[胸部或非胸部]匹配)相比,评估生成模型的使用与放射科医生文档记录效率之间的关联。对模型辅助解读进行同行评审。对X光片前瞻性地进行需要干预的气胸标记。
主要结局包括生成模型的使用与放射科医生文档记录效率之间的关联,通过使用线性混合效应模型比较有无模型使用时的文档记录时间差异来评估;对于模型辅助报告的同行评审,使用累积链接混合模型比较李克特量表评分的差异;对于标记需要干预的气胸,评估敏感性和特异性。
总共23960张X光片(模型使用组和非模型使用组各11980张)用于分析文档记录效率。模型辅助解读(平均[标准误],159.8[27.0]秒)比无模型使用的基线组(平均[标准误],189.2[36.2]秒)更快(P = 0.02),文档记录效率提高了15.5%。对800项研究的同行评审显示,模型辅助解读与非模型解读在临床准确性(χ2 = 0.68;P = 0.41)或文本质量(χ2 = 3.62;P = 0.06)方面没有差异。此外,在筛查的97651项研究中,该模型标记出包含具有临床意义的意外气胸的研究,敏感性为72.7%,特异性为99.9%。
在这项关于生成模型用于放射学报告初稿临床应用的前瞻性队列研究中,模型的使用与放射科医生文档记录效率的提高相关,同时保持了临床质量,并显示出检测包含需要立即干预的气胸的研究的潜力。这项研究表明放射科医生与生成式AI合作改善临床护理的潜力。