Sun Ju, Peng Le, Li Taihui, Adila Dyah, Zaiman Zach, Melton-Meaux Genevieve B, Ingraham Nicholas E, Murray Eric, Boley Daniel, Switzer Sean, Burns John L, Huang Kun, Allen Tadashi, Steenburg Scott D, Gichoya Judy Wawira, Kummerfeld Erich, Tignanelli Christopher J
Department of Computer Science and Engineering (J.S., L.P., T.L., D.A., D.B.), Institute for Health Informatics (G.B.M.M., E.K., C.J.T.), Department of Surgery (G.B.M.M., C.J.T.), Department of Medicine, Division of Pulmonary and Critical Care (N.E.I.), Department of Medicine (S.S.), and Department of Radiology (T.A.), University of Minnesota, 420 Delaware St SE, Minneapolis, MN 55455; Departments of Computer Science (Z.Z.) and Radiology (J.W.G.), Emory University, Atlanta, Ga; M Health Fairview Informatics, Minneapolis, Minn (E.M.); The School of Medicine (J.L.B., K.H.) and Department of Radiology (S.D.S.), Indiana University, Indianapolis, Ind; and Department of Surgery, North Memorial Health Hospital, Robbinsdale, Minn (C.J.T.).
Radiol Artif Intell. 2022 Jun 1;4(4):e210217. doi: 10.1148/ryai.210217. eCollection 2022 Jul.
To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs.
A total of 95 363 chest radiographs were included in model training, external validation, and real-time validation. The model was deployed as a clinical decision support system, and performance was prospectively evaluated. There were 5335 total real-time predictions and a COVID-19 prevalence of 4.8% (258 of 5335). Model performance was assessed with use of receiver operating characteristic analysis, precision-recall curves, and F1 score. Logistic regression was used to evaluate the association of race and sex with AI model diagnostic accuracy. To compare model accuracy with the performance of board-certified radiologists, a third dataset of 1638 images was read independently by two radiologists.
Participants positive for COVID-19 had higher COVID-19 diagnostic scores than participants negative for COVID-19 (median, 0.1 [IQR, 0.0-0.8] vs 0.0 [IQR, 0.0-0.1], respectively; < .001). Real-time model performance was unchanged over 19 weeks of implementation (area under the receiver operating characteristic curve, 0.70; 95% CI: 0.66, 0.73). Model sensitivity was higher in men than women ( = .01), whereas model specificity was higher in women ( = .001). Sensitivity was higher for Asian ( = .002) and Black ( = .046) participants compared with White participants. The COVID-19 AI diagnostic system had worse accuracy (63.5% correct) compared with radiologist predictions (radiologist 1 = 67.8% correct, radiologist 2 = 68.6% correct; McNemar < .001 for both).
AI-based tools have not yet reached full diagnostic potential for COVID-19 and underperform compared with radiologist prediction. Diagnosis, Classification, Application Domain, Infection, Lung . © RSNA, 2022.
在美国12家医院开展一项前瞻性观察性研究,以评估一种可解释的人工智能(AI)模型在胸部X线片上检测新型冠状病毒肺炎(COVID-19)的实时性能。
共有95363张胸部X线片纳入模型训练、外部验证和实时验证。该模型作为临床决策支持系统进行部署,并对其性能进行前瞻性评估。共有5335次实时预测,COVID-19患病率为4.8%(5335例中的258例)。使用受试者操作特征分析、精确召回曲线和F1分数评估模型性能。采用逻辑回归评估种族和性别与AI模型诊断准确性的关联。为了将模型准确性与获得委员会认证的放射科医生的表现进行比较,由两名放射科医生独立读取包含1638张图像的第三个数据集。
COVID-19阳性参与者的COVID-19诊断分数高于COVID-19阴性参与者(中位数分别为0.1[四分位间距,0.0 - 0.8]和0.0[四分位间距,0.0 - 0.1];P <.001)。在实施的19周内,实时模型性能未发生变化(受试者操作特征曲线下面积为0.70;95%置信区间:0.66,0.73)。模型敏感性男性高于女性(P = .01),而模型特异性女性更高(P = .001)。与白人参与者相比,亚洲(P = .002)和黑人(P = .046)参与者的敏感性更高。与放射科医生的预测相比,COVID-19人工智能诊断系统的准确性较差(正确诊断率为63.5%)(放射科医生1的正确诊断率为67.8%,放射科医生2的正确诊断率为68.6%;两者的McNemar检验P <.001)。
基于AI的工具在COVID-19诊断方面尚未发挥出全部潜力,与放射科医生的预测相比表现较差。诊断、分类、应用领域、感染、肺。©RSNA,2022。