Department of Emergency Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois.
Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, Illinois.
JAMA Netw Open. 2023 Oct 2;6(10):e2336100. doi: 10.1001/jamanetworkopen.2023.36100.
Multimodal generative artificial intelligence (AI) methodologies have the potential to optimize emergency department care by producing draft radiology reports from input images.
To evaluate the accuracy and quality of AI-generated chest radiograph interpretations in the emergency department setting.
DESIGN, SETTING, AND PARTICIPANTS: This was a retrospective diagnostic study of 500 randomly sampled emergency department encounters at a tertiary care institution including chest radiographs interpreted by both a teleradiology service and on-site attending radiologist from January 2022 to January 2023. An AI interpretation was generated for each radiograph. The 3 radiograph interpretations were each rated in duplicate by 6 emergency department physicians using a 5-point Likert scale.
The primary outcome was any difference in Likert scores between radiologist, AI, and teleradiology reports, using a cumulative link mixed model. Secondary analyses compared the probability of each report type containing no clinically significant discrepancy with further stratification by finding presence, using a logistic mixed-effects model. Physician comments on discrepancies were recorded.
A total of 500 ED studies were included from 500 unique patients with a mean (SD) age of 53.3 (21.6) years; 282 patients (56.4%) were female. There was a significant association of report type with ratings, with post hoc tests revealing significantly greater scores for AI (mean [SE] score, 3.22 [0.34]; P < .001) and radiologist (mean [SE] score, 3.34 [0.34]; P < .001) reports compared with teleradiology (mean [SE] score, 2.74 [0.34]) reports. AI and radiologist reports were not significantly different. On secondary analysis, there was no difference in the probability of no clinically significant discrepancy between the 3 report types. Further stratification of reports by presence of cardiomegaly, pulmonary edema, pleural effusion, infiltrate, pneumothorax, and support devices also yielded no difference in the probability of containing no clinically significant discrepancy between the report types.
In a representative sample of emergency department chest radiographs, results suggest that the generative AI model produced reports of similar clinical accuracy and textual quality to radiologist reports while providing higher textual quality than teleradiologist reports. Implementation of the model in the clinical workflow could enable timely alerts to life-threatening pathology while aiding imaging interpretation and documentation.
多模态生成式人工智能 (AI) 方法有可能通过从输入图像生成草稿放射学报告来优化急诊科的护理。
评估 AI 生成的急诊科胸部 X 光片解读的准确性和质量。
设计、设置和参与者:这是一项回顾性诊断研究,对 2022 年 1 月至 2023 年 1 月期间在一家三级医疗机构进行的 500 例随机抽样急诊科就诊的胸部 X 光片进行了研究,包括由远程放射科服务和现场主治放射科医生进行的解释。为每张 X 光片生成 AI 解释。6 名急诊科医生使用 5 分李克特量表对这 3 种 X 光片解释进行了重复评分。
主要结果是使用累积链接混合模型比较放射科医生、AI 和远程放射科报告的李克特评分之间的任何差异。使用逻辑混合效应模型进行二次分析,并进一步分层比较每种报告类型是否存在无临床显著差异的概率。记录了医生对差异的评论。
共有 500 名来自 500 名独特患者的急诊科患者纳入研究,平均(SD)年龄为 53.3(21.6)岁;282 名患者(56.4%)为女性。报告类型与评分存在显著关联,事后检验显示 AI(平均[SE]评分,3.22[0.34];P<.001)和放射科医生(平均[SE]评分,3.34[0.34];P<.001)报告的评分明显高于远程放射科医生(平均[SE]评分,2.74[0.34])报告。AI 和放射科医生的报告没有显著差异。在二次分析中,三种报告类型之间不存在无临床显著差异的概率差异。进一步按是否存在心脏增大、肺水肿、胸腔积液、浸润、气胸和支持设备分层报告,也没有发现报告类型之间无临床显著差异的概率存在差异。
在急诊科胸部 X 光片的代表性样本中,结果表明,生成式 AI 模型生成的报告在临床准确性和文本质量方面与放射科医生报告相似,同时提供了比远程放射科医生报告更高的文本质量。在临床工作流程中实施该模型可以在辅助影像学解释和记录的同时,及时提醒危及生命的病理学。