From the Department of Radiology, Herlev and Gentofte Hospital, Borgmester Ib Juuls vej 1, 2730 Herlev, Copenhagen, Denmark (L.L.P., F.C.M., L.C.L., M.B.A.); Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark (L.L.P., O.W.N., M.B., M.B.A.); Radiological Artificial Intelligence Testcenter, RAIT.dk, Capital region of Denmark (L.L.P., F.C.M., J.D.N., M.B., M.B.A.); Department of Radiology, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark (J.D.N., M.B.); Department of Radiology, Aarhus University Hospital, Aarhus, Denmark (F.R.); and Department of Cardiology, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark (O.W.N.).
Radiology. 2023 May;307(3):e222268. doi: 10.1148/radiol.222268. Epub 2023 Mar 7.
Background Automated interpretation of normal chest radiographs could alleviate the workload of radiologists. However, the performance of such an artificial intelligence (AI) tool compared with clinical radiology reports has not been established. Purpose To perform an external evaluation of a commercially available AI tool for the number of chest radiographs autonomously reported, the sensitivity for AI detection of abnormal chest radiographs, and the performance of AI compared with that of the clinical radiology reports. Materials and Methods In this retrospective study, consecutive posteroanterior chest radiographs from adult patients in four hospitals in the capital region of Denmark were obtained in January 2020, including images from emergency department patients, in-hospital patients, and outpatients. Three thoracic radiologists labeled chest radiographs in a reference standard based on chest radiograph findings into the following categories: critical, other remarkable, unremarkable, or normal (no abnormalities). AI classified chest radiographs as high confidence normal (normal) or not high confidence normal (abnormal). Results A total of 1529 patients were included for analysis (median age, 69 years [IQR, 55-69 years]; 776 women), with 1100 (72%) classified by the reference standard as having abnormal radiographs, 617 (40%) as having critical abnormal radiographs, and 429 (28%) as having normal radiographs. For comparison, clinical radiology reports were classified based on the text and insufficient reports excluded ( = 22). The sensitivity of AI was 99.1% (95% CI: 98.3, 99.6; 1090 of 1100 patients) for abnormal radiographs and 99.8% (95% CI: 99.1, 99.9; 616 of 617 patients) for critical radiographs. Corresponding sensitivities for radiologist reports were 72.3% (95% CI: 69.5, 74.9; 779 of 1078 patients) and 93.5% (95% CI: 91.2, 95.3; 558 of 597 patients), respectively. Specificity of AI, and hence the potential autonomous reporting rate, was 28.0% of all normal posteroanterior chest radiographs (95% CI: 23.8, 32.5; 120 of 429 patients), or 7.8% (120 of 1529 patients) of all posteroanterior chest radiographs. Conclusion Of all normal posteroanterior chest radiographs, 28% were autonomously reported by AI with a sensitivity for any abnormalities higher than 99%. This corresponded to 7.8% of the entire posteroanterior chest radiograph production. © RSNA, 2023 See also the editorial by Park in this issue.
背景 自动化解读正常胸部 X 光片可以减轻放射科医生的工作量。然而,这种人工智能(AI)工具的性能与临床放射学报告相比尚未确定。目的 对一种商业上可用的 AI 工具进行外部评估,评估其自主报告的胸部 X 光片数量、AI 检测异常胸部 X 光片的灵敏度,以及 AI 与临床放射学报告的性能比较。材料与方法 本回顾性研究于 2020 年 1 月在丹麦首都地区的 4 家医院连续采集了后前位成人胸部 X 光片,包括急诊科患者、住院患者和门诊患者的图像。3 名胸部放射科医生根据胸部 X 光片的结果,在参考标准中将胸部 X 光片标记为以下类别:危急、其他显著、无异常或正常(无异常)。AI 将胸部 X 光片分类为高置信度正常(正常)或非高置信度正常(异常)。结果 共纳入 1529 名患者进行分析(中位年龄 69 岁[IQR,55-69 岁];776 名女性),1100 名(72%)根据参考标准被归类为异常 X 光片,617 名(40%)为危急异常 X 光片,429 名(28%)为正常 X 光片。相比之下,临床放射学报告是根据文本进行分类的,排除了不充分的报告(=22)。AI 对异常 X 光片的灵敏度为 99.1%(95%CI:98.3,99.6;1100 例中的 1090 例),对危急 X 光片的灵敏度为 99.8%(95%CI:99.1,99.9;617 例中的 616 例)。放射科医生报告的相应灵敏度分别为 72.3%(95%CI:69.5,74.9;779 例中的 779 例)和 93.5%(95%CI:91.2,95.3;558 例中的 558 例)。AI 的特异性,因此潜在的自主报告率,为所有正常后前位胸部 X 光片的 28.0%(95%CI:23.8,32.5;429 例中的 120 例),或所有后前位胸部 X 光片的 7.8%(1529 例中的 120 例)。结论 在所有正常的后前位胸部 X 光片中,28%由 AI 自动报告,其对任何异常的灵敏度均高于 99%。这相当于整个后前位胸部 X 光片产量的 7.8%。