University Hospitals Cleveland Medical Center, Case Western Reserve University, Cleveland, Ohio, USA (T.F., V.V., V.K., R.B., L.K.B., N.F.).
University Hospitals Cleveland Medical Center, Case Western Reserve University, Cleveland, Ohio, USA (T.F., V.V., V.K., R.B., L.K.B., N.F.).
Acad Radiol. 2024 May;31(5):1989-1999. doi: 10.1016/j.acra.2023.10.042. Epub 2023 Nov 22.
To evaluate the standalone performance of a deep learning (DL) based fracture detection tool on extremity radiographs and assess the performance of radiologists and emergency physicians in identifying fractures of the extremities with and without the DL aid.
The DL tool was previously developed using 132,000 appendicular skeletal radiographs divided into 87% training, 11% validation, and 2% test sets. Stand-alone performance was evaluated on 2626 de-identified radiographs from a single institution in Ohio, including at least 140 exams per body region. Consensus from three US board-certified musculoskeletal (MSK) radiologists served as ground truth. A multi-reader retrospective study was performed in which 24 readers (eight each of emergency physicians, non-MSK radiologists, and MSK radiologists) identified fractures in 186 cases during two independent sessions with and without DL aid, separated by a one-month washout period. The accuracy (area under the receiver operating curve), sensitivity, specificity, and reading time were compared with and without model aid.
The model achieved a stand-alone accuracy of 0.986, sensitivity of 0.987, and specificity of 0.885, and high accuracy (> 0.95) across stratification for body part, age, gender, radiographic views, and scanner type. With DL aid, reader accuracy increased by 0.047 (95% CI: 0.034, 0.061; p = 0.004) and sensitivity significantly improved from 0.865 (95% CI: 0.848, 0.881) to 0.955 (95% CI: 0.944, 0.964). Average reading time was shortened by 7.1 s (27%) per exam. When stratified by physician type, this improvement was greater for emergency physicians and non-MSK radiologists.
The DL tool demonstrated high stand-alone accuracy, aided physician diagnostic accuracy, and decreased interpretation time.
评估一种基于深度学习(DL)的骨折检测工具在四肢 X 光片上的独立性能,并评估放射科医生和急诊医生在有和没有 DL 辅助的情况下识别四肢骨折的能力。
该 DL 工具是使用 132000 张四肢骨骼 X 光片开发的,分为 87%的训练集、11%的验证集和 2%的测试集。在俄亥俄州的一家单机构中,对 2626 张去标识的 X 光片进行了独立性能评估,每个身体区域至少包括 140 个检查。由三位美国认证的肌肉骨骼(MSK)放射科医生组成的共识作为金标准。进行了一项多读者回顾性研究,其中 24 名读者(每位急诊医生、非 MSK 放射科医生和 MSK 放射科医生各 8 名)在两次独立的检查中使用和不使用 DL 辅助识别 186 例骨折,两次检查之间间隔一个月的洗脱期。比较有无模型辅助时的准确性(接受者操作特征曲线下面积)、敏感性、特异性和阅读时间。
该模型的独立准确性为 0.986,敏感性为 0.987,特异性为 0.885,在身体部位、年龄、性别、放射视图和扫描仪类型的分层中均具有较高的准确性(>0.95)。使用 DL 辅助,读者的准确性提高了 0.047(95%CI:0.034,0.061;p=0.004),敏感性从 0.865(95%CI:0.848,0.881)显著提高到 0.955(95%CI:0.944,0.964)。平均阅读时间每检查缩短了 7.1 秒(27%)。按医生类型分层时,急诊医生和非 MSK 放射科医生的改善更大。
该 DL 工具具有较高的独立准确性,辅助医生的诊断准确性,并缩短了解释时间。