Savage Cody H, Tanwar Manoj, Elkassem Asser Abou, Sturdivant Adam, Hamki Omar, Sotoudeh Houman, Sirineni Gopi, Singhal Aparna, Milner Desmin, Jones Jesse, Rehder Dirk, Li Mei, Li Yufeng, Junck Kevin, Tridandapani Srini, Rothenberg Steven A, Smith Andrew D
Department of Diagnostic Radiology & Nuclear Medicine, University of Maryland Medical Intelligent Imaging (UM2ii) Center, University of Maryland School of Medicine, Baltimore, MD.
Department of Radiology, University of Alabama at Birmingham Heersink School of Medicine, Birmingham, AL.
AJR Am J Roentgenol. 2024 Nov;223(5):e2431639. doi: 10.2214/AJR.24.31639. Epub 2024 Sep 4.
Retrospective studies evaluating artificial intelligence (AI) algorithms for intracranial hemorrhage (ICH) detection on noncontrast CT (NCCT) have shown promising results but lack prospective validation. The purpose of this article was to evaluate the impact on radiologists' real-world aggregate performance for ICH detection and report turnaround times for ICH-positive examinations of a radiology department's implementation of an AI triage and notification system for ICH detection on head NCCT examinations. This prospective single-center study included adult patients who underwent head NCCT examinations from May 12, 2021, to June 30, 2021 (phase 1), or from September 30, 2021, to December 4, 2021 (phase 2). Before phase 1, the radiology department implemented a commercial AI triage system for ICH detection that processed head NCCT examinations and notified radiologists of positive results through a widget with a floating pop-up display. Examinations were interpreted by neuroradiologists or emergency radiologists, who evaluated examinations without and with AI assistance in phases 1 and 2, respectively. A panel of radiologists conducted a review process for all examinations with discordance between the radiology report and AI and a subset of remaining examinations to establish the reference standard. Diagnostic performance and report turnaround times were compared using the Pearson chi-square test and Wilcoxon rank sum test, respectively. Bonferroni correction was used to account for five diagnostic performance metrics (adjusted significance threshold, .01 [α = .05/5]). A total of 9954 examinations from 7371 patients (mean age, 54.8 ± 19.8 [SD] years; 3773 women, 3598 men) were included. In phases 1 and 2, 19.8% (735/3716) and 21.9% (1368/6238) of examinations, respectively, were positive for ICH (01). Radiologists without versus with AI showed no significant difference in accuracy (99.5% vs 99.2%), sensitivity (98.6% vs 98.9%), PPV (99.0% vs 97.5%), or NPV (99.7% vs 99.7%) (all > .01); specificity was higher for radiologists without than with AI (99.8% vs 99.3%, respectively, = .004). Mean report turnaround time for ICH-positive examinations was 147.1 minutes without AI versus 149.9 minutes with AI ( = .11). An AI triage system for ICH detection did not improve radiologists' diagnostic performance or report turnaround times. This large prospective real-world study does not support use of AI assistance for ICH detection.
评估用于非增强CT(NCCT)上颅内出血(ICH)检测的人工智能(AI)算法的回顾性研究已显示出有前景的结果,但缺乏前瞻性验证。本文的目的是评估放射科实施用于头部NCCT检查的ICH检测AI分诊和通知系统对放射科医生在ICH检测方面的实际总体表现以及ICH阳性检查报告周转时间的影响。这项前瞻性单中心研究纳入了在2021年5月12日至2021年6月30日(第1阶段)或2021年9月30日至2021年12月4日(第2阶段)接受头部NCCT检查的成年患者。在第1阶段之前,放射科实施了一种用于ICH检测的商业AI分诊系统,该系统处理头部NCCT检查,并通过带有浮动弹出显示的小部件将阳性结果通知放射科医生。检查由神经放射科医生或急诊放射科医生进行解读,他们在第1阶段和第2阶段分别在无AI辅助和有AI辅助的情况下评估检查。一组放射科医生对放射学报告与AI之间存在不一致的所有检查以及其余检查的一个子集进行了审查过程,以确定参考标准。分别使用Pearson卡方检验和Wilcoxon秩和检验比较诊断性能和报告周转时间。使用Bonferroni校正来考虑五个诊断性能指标(调整后的显著性阈值,.01[α =.05/5])。共纳入了来自7371例患者的9954次检查(平均年龄,54.8±19.8[标准差]岁;女性3773例,男性3598例)。在第1阶段和第2阶段,分别有19.8%(735/3716)和21.9%(1368/6238)的检查ICH呈阳性(01)。无AI辅助与有AI辅助的放射科医生在准确性(99.5%对99.2%)、敏感性(98.6%对98.9%)、阳性预测值(99.0%对97.5%)或阴性预测值(99.7%对99.7%)方面均无显著差异(所有P>.01);无AI辅助的放射科医生的特异性高于有AI辅助的放射科医生(分别为99.8%对99.3%,P =.004)。ICH阳性检查的平均报告周转时间在无AI辅助时为147.1分钟,有AI辅助时为149.9分钟(P =.11)。用于ICH检测的AI分诊系统并未提高放射科医生的诊断性能或报告周转时间。这项大型前瞻性真实世界研究不支持在ICH检测中使用AI辅助。