Chen Yingming Amy, Hu Zixuan, Shek Kevin D, Wilson Jefferson, Alotaibi Fahad Saud S, Witiw Christopher D, Lin Hui Ming, Ball Robyn L, Patel Markand, Mathur Shobhit, Sejdić Ervin, Colak Errol
Department of Medical Imaging, Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
Department of Medical Imaging, St Michael's Hospital, Unity Health Toronto, 30 Bond St, Toronto, ON M5B 1W8, Canada.
AJR Am J Roentgenol. 2025 Mar;224(3):e2432076. doi: 10.2214/AJR.24.32076. Epub 2025 Jan 8.
Available data on radiologists' missed cervical spine fractures are based primarily on studies using human reviewers to identify errors on reevaluation; such studies do not capture the full extent of missed fractures. The purpose of this study was to use machine learning (ML) models to identify cervical spine fractures on CT missed by interpreting radiologists, characterize the nature of these fractures, and assess their clinical significance. This retrospective study included all cervical spine CT examinations performed in adult patients in the emergency department between January 1, 2018, and December 31, 2022. Examinations reported as negative for cervical spine fracture were processed by seven award-winning ML models from the 2022 Radiological Society of North America Cervical Spine Fracture AI Challenge; examinations classified as positive by at least four of the seven models were considered to have ML-detected fractures. Two neuroradiologists independently reviewed examinations with ML-detected fractures using ML-derived heat maps to identify those representing true missed fractures. The neuroradiologists further assessed the fractures' extent. Two spine surgeons independently assessed whether missed fractures were clinically significant (i.e., warranting at least one of surgical consultation, MRI, CTA, or collar immobilization). The study included 6671 patients (2414 women, 4257 men; mean age, 54.6 ± 22.1 [SD] years) who underwent a total of 6979 cervical spine CT examinations. Interpreting radiologists reported 6378 examinations as negative for fracture. Of these, 356 had ML-detected fractures (i.e., positive by at least four of seven models). The neuroradiologists classified 40 of these examinations, in 39 unique patients, as having true fractures. ML-detected missed true fractures involved 51 unique sites, most commonly the C7 transverse process ( = 12), C5 spinous process ( = 12), and C6 spinous process ( = 8). The surgeons considered missed fractures clinically significant in 15 of 40 examinations (MRI and collar immobilization [ = 7], MRI and surgical evaluation [ = 1], CTA [ = 9]). Interobserver agreement, expressed as kappa, was 0.88 between neuroradiologists for true fracture classification and 0.94 between surgeons for clinical significance classification. ML models identified cervical spine fractures missed by radiologists. These fractures were further characterized to systematically highlight radiologists' common misses. This ML-based framework can be applied in quality improvement efforts, to help refine radiologists' search patterns based on prone-to-miss findings.
关于放射科医生漏诊颈椎骨折的现有数据主要基于使用人工审阅者在重新评估时识别错误的研究;此类研究无法全面捕捉漏诊骨折的情况。本研究的目的是使用机器学习(ML)模型识别解读放射科医生漏诊的颈椎CT骨折,描述这些骨折的特征,并评估其临床意义。这项回顾性研究纳入了2018年1月1日至2022年12月31日期间在急诊科成年患者中进行的所有颈椎CT检查。报告为颈椎骨折阴性的检查由来自2022年北美放射学会颈椎骨折人工智能挑战赛的七个获奖ML模型进行处理;七个模型中至少有四个分类为阳性的检查被认为有ML检测到的骨折。两名神经放射科医生使用ML生成的热图独立审阅有ML检测到骨折的检查,以识别那些代表真正漏诊骨折的情况。神经放射科医生进一步评估骨折的范围。两名脊柱外科医生独立评估漏诊骨折是否具有临床意义(即是否需要手术会诊、MRI、CTA或颈托固定中的至少一项)。该研究纳入了6671例患者(2414例女性,4257例男性;平均年龄54.6±22.1[标准差]岁),他们总共接受了6979次颈椎CT检查。解读放射科医生报告6378次检查为骨折阴性。其中,356次有ML检测到的骨折(即七个模型中至少有四个为阳性)。神经放射科医生将其中39例不同患者的40次检查分类为有真正的骨折。ML检测到的漏诊真正骨折涉及51个不同部位,最常见的是C7横突( = 12)、C5棘突( = 12)和C6棘突( = 8)。外科医生认为40次检查中有15次漏诊骨折具有临床意义(MRI和颈托固定[ = 7]、MRI和手术评估[ = 1]、CTA[ = 9])。神经放射科医生之间关于真正骨折分类的观察者间一致性(以kappa表示)为0.88,外科医生之间关于临床意义分类的观察者间一致性为0.94。ML模型识别出了放射科医生漏诊的颈椎骨折。对这些骨折进行了进一步特征描述,以系统地突出放射科医生常见的漏诊情况。这种基于ML的框架可应用于质量改进工作,以帮助根据易漏诊的发现优化放射科医生的搜索模式。