Department of Diagnostic Radiology, Jinling Hospital, Medical School of Nanjing University, 305 East Zhongshan Rd, Nanjing, 210002, China.
Tencent Jarvis Lab, Shenzhen, 518000, China.
Eur Radiol. 2023 Jan;33(1):555-565. doi: 10.1007/s00330-022-08950-w. Epub 2022 Jun 24.
To identify the feasibility of deep learning-based diagnostic models for detecting and assessing lower-extremity fatigue fracture severity on plain radiographs.
This retrospective study enrolled 1151 X-ray images (tibiofibula/foot: 682/469) of fatigue fractures and 2842 X-ray images (tibiofibula/foot: 2000/842) without abnormal presentations from two clinical centers. After labeling the lesions, images in a center (tibiofibula/foot: 2539/1180) were allocated at 7:1:2 for model construction, and the remaining images from another center (tibiofibula/foot: 143/131) for external validation. A ResNet-50 and a triplet branch network were adopted to construct diagnostic models for detecting and grading. The performances of detection models were evaluated with sensitivity, specificity, and area under the receiver operating characteristic curve (AUC), while grading models were evaluated with accuracy by confusion matrix. Visual estimations by radiologists were performed for comparisons with models.
For the detection model on tibiofibula, a sensitivity of 95.4%/85.5%, a specificity of 80.1%/77.0%, and an AUC of 0.965/0.877 were achieved in the internal testing/external validation set. The detection model on foot reached a sensitivity of 96.4%/90.8%, a specificity of 76.0%/66.7%, and an AUC of 0.947/0.911. The detection models showed superior performance to the junior radiologist, comparable to the intermediate or senior radiologist. The overall accuracy of the diagnostic model was 78.5%/62.9% for tibiofibula and 74.7%/61.1% for foot in the internal testing/external validation set.
The deep learning-based models could be applied to the radiological diagnosis of plain radiographs for assisting in the detection and grading of fatigue fractures on tibiofibula and foot.
• Fatigue fractures on radiographs are relatively difficult to detect, and apt to be misdiagnosed. • Detection and grading models based on deep learning were constructed on a large cohort of radiographs with lower-extremity fatigue fractures. • The detection model with high sensitivity would help to reduce the misdiagnosis of lower-extremity fatigue fractures.
确定基于深度学习的诊断模型在平片上检测和评估下肢疲劳骨折严重程度的可行性。
本回顾性研究纳入了来自两个临床中心的 1151 张(胫腓骨/足部:682/469)疲劳骨折 X 射线图像和 2842 张(胫腓骨/足部:2000/842)无异常表现的 X 射线图像。对病变进行标记后,将一个中心(胫腓骨/足部:2539/1180)的图像以 7:1:2 的比例分配用于模型构建,另一个中心(胫腓骨/足部:143/131)的剩余图像用于外部验证。采用 ResNet-50 和三重分支网络构建用于检测和分级的诊断模型。使用灵敏度、特异性和接收器工作特征曲线下面积(AUC)评估检测模型的性能,而使用混淆矩阵评估分级模型的准确性。进行放射科医生的视觉评估以与模型进行比较。
在胫腓骨的检测模型中,内部测试/外部验证集的灵敏度分别为 95.4%/85.5%、特异性分别为 80.1%/77.0%和 AUC 分别为 0.965/0.877。足部的检测模型达到了 96.4%/90.8%的灵敏度、76.0%/66.7%的特异性和 0.947/0.911 的 AUC。与初级放射科医生相比,检测模型的性能更优,与中级或高级放射科医生相当。在内部测试/外部验证集,诊断模型的整体准确性分别为胫腓骨的 78.5%/62.9%和足部的 74.7%/61.1%。
基于深度学习的模型可应用于平片的放射诊断,以协助检测和分级胫腓骨和足部的疲劳骨折。
平片上的疲劳骨折相对难以检测,容易误诊。
基于深度学习构建了用于检测和分级下肢疲劳骨折的大量 X 射线图像的检测和分级模型。
高灵敏度的检测模型有助于减少下肢疲劳骨折的误诊。