Bhatia Bahadar S, Morlese John F, Yusuf Sarah, Xie Yiting, Schallhorn Bob, Gruen David
Directorate of Diagnostic Radiology, Sandwell & West Birmingham NHS Trust, Lyndon, West Bromwich B71 4HJ, United Kingdom.
Space Research Centre, Physics & Astronomy, University of Leicester, 92 Corporation Road, Leicester LE4 5SP, United Kingdom.
BJR Open. 2023 Dec 12;6(1):tzad009. doi: 10.1093/bjro/tzad009. eCollection 2024 Jan.
This diagnostic study assessed the accuracy of radiologists retrospectively, using the deep learning and natural language processing chest algorithms implemented in Clinical Review version 3.2 for: pneumothorax, rib fractures in digital chest X-ray radiographs (CXR); aortic aneurysm, pulmonary nodules, emphysema, and pulmonary embolism in CT images.
The study design was double-blind (artificial intelligence [AI] algorithms and humans), retrospective, non-interventional, and at a single NHS Trust. Adult patients (≥18 years old) scheduled for CXR and CT were invited to enroll as participants through an opt-out process. Reports and images were de-identified, processed retrospectively, and AI-flagged discrepant findings were assigned to two lead radiologists, each blinded to patient identifiers and original radiologist. The radiologist's findings for each clinical condition were tallied as a verified discrepancy (true positive) or not (false positive).
The missed findings were: 0.02% rib fractures, 0.51% aortic aneurysm, 0.32% pulmonary nodules, 0.92% emphysema, and 0.28% pulmonary embolism. The positive predictive values (PPVs) were: pneumothorax (0%), rib fractures (5.6%), aortic dilatation (43.2%), pulmonary emphysema (46.0%), pulmonary embolus (11.5%), and pulmonary nodules (9.2%). The PPV for pneumothorax was nil owing to lack of available studies that were analysed for outpatient activity.
The number of missed findings was far less than generally predicted. The chest algorithms deployed retrospectively were a useful quality tool and AI augmented the radiologists' workflow.
The diagnostic accuracy of our radiologists generated missed findings of 0.02% for rib fractures CXR, 0.51% for aortic dilatation, 0.32% for pulmonary nodule, 0.92% for pulmonary emphysema, and 0.28% for pulmonary embolism for CT studies, all retrospectively evaluated with AI used as a quality tool to flag potential missed findings. It is important to account for prevalence of these chest conditions in clinical context and use appropriate clinical thresholds for decision-making, not relying solely on AI.
这项诊断性研究回顾性评估了放射科医生的诊断准确性,使用临床评估版本3.2中实施的深度学习和自然语言处理胸部算法,用于诊断:气胸、数字化胸部X线片(CXR)中的肋骨骨折;CT图像中的主动脉瘤、肺结节、肺气肿和肺栓塞。
研究设计为双盲(人工智能[AI]算法和人类)、回顾性、非干预性,且在单一的国民保健服务信托机构进行。计划进行CXR和CT检查的成年患者(≥18岁)通过退出程序被邀请作为参与者入组。报告和图像进行了去识别处理,进行回顾性分析,AI标记的有差异的发现被分配给两位首席放射科医生,他们对患者标识符和原放射科医生均不知情。将放射科医生对每种临床情况的发现统计为已证实的差异(真阳性)或未证实的差异(假阳性)。
漏诊的情况为:肋骨骨折0.02%、主动脉瘤0.51%、肺结节0.32%、肺气肿0.92%、肺栓塞0.28%。阳性预测值(PPV)分别为:气胸(0%)、肋骨骨折(5.6%)、主动脉扩张(43.2%)、肺气肿(46.0%)、肺栓塞(11.5%)、肺结节(9.2%)。由于缺乏针对门诊活动进行分析的可用研究,气胸的PPV为零。
漏诊情况的数量远低于一般预测。回顾性部署的胸部算法是一种有用的质量工具,AI增强了放射科医生的工作流程。
我们放射科医生的诊断准确性在CXR肋骨骨折方面产生了0.02%的漏诊情况,主动脉扩张为0.51%,肺结节为0.32%,肺气肿为0.92%,CT研究中肺栓塞为0.28%,所有这些均通过将AI用作标记潜在漏诊情况的质量工具进行回顾性评估。在临床背景下考虑这些胸部疾病的患病率并使用适当的临床阈值进行决策很重要,而不是仅仅依赖AI。