Zheng Zhiqian, Ryu Byeong Y, Kim Sung E, Song Dae S, Kim Seong H, Park Jung-Wee, Ro Du H
Department of Orthopedic Surgery, Seoul National University Hospital, Seoul, South Korea.
Department of Orthopedic Surgery, Seoul National University College of Medicine, Seoul, South Korea.
Bone Joint J. 2025 Feb 1;107-B(2):213-220. doi: 10.1302/0301-620X.107B2.BJJ-2024-0791.R1.
The aim of this study was to develop and evaluate a deep learning-based model for classification of hip fractures to enhance diagnostic accuracy.
A retrospective study used 5,168 hip anteroposterior radiographs, with 4,493 radiographs from two institutes (internal dataset) for training and 675 radiographs from another institute for validation. A convolutional neural network (CNN)-based classification model was trained on four types of hip fractures (Displaced, Valgus-impacted, Stable, and Unstable), using DAMO-YOLO for data processing and augmentation. The model's accuracy, sensitivity, specificity, Intersection over Union (IoU), and Dice coefficient were evaluated. Orthopaedic surgeons' diagnoses served as the reference standard, with comparisons made before and after artificial intelligence assistance.
The accuracy, sensitivity, specificity, IoU, and Dice coefficients of the model for the four fracture categories in the internal dataset were as follows: Displaced (1.0, 0.79, 1.0, 0.70, 0.82), Valgus-impacted (1.0, 0.80, 1.0, 0.70, 0.82), Stable (0.99, 0.95, 0.99, 0.83, 0.89), and Unstable (1.0, 0.98, 0.99, 0.86, 0.92), respectively. For the external validation dataset, the sensitivity and specificity were as follows: Displaced (0.83, 0.94), Valgus-impacted (0.89, 0.90), Stable (0.88, 0.95), and Unstable (0.85, 0.99), respectively. The overall means (Micro AVG and Macro AVG) for the external dataset were Micro AVG (0.83 (SD 0.05), 0.96 (SD 0.01)) and Macro AVG (0.69 (SD 0.02), 0.95 (SD 0.02)), respectively.
Compared to human diagnosis alone, our study demonstrates that the developed model significantly improves the accuracy of detecting and classifying hip fractures. Our model has shown great potential in assisting clinicians with the accurate diagnosis and classification of hip fractures.
本研究旨在开发并评估一种基于深度学习的髋部骨折分类模型,以提高诊断准确性。
一项回顾性研究使用了5168张髋部前后位X线片,其中来自两个机构的4493张X线片(内部数据集)用于训练,另一个机构的675张X线片用于验证。基于卷积神经网络(CNN)的分类模型针对四种类型的髋部骨折(移位型、外翻嵌插型、稳定型和不稳定型)进行训练,使用DAMO-YOLO进行数据处理和增强。评估了该模型的准确性、敏感性、特异性、交并比(IoU)和Dice系数。骨科医生的诊断作为参考标准,在人工智能辅助前后进行了比较。
内部数据集中模型对四种骨折类型的准确性、敏感性、特异性、IoU和Dice系数如下:移位型(1.0, 0.79, 1.0, 0.70, 0.82)、外翻嵌插型(1.0, 0.80, 1.0, 0.70, 0.82)、稳定型(0.99, 0.95, 0.99, 0.83, 0.89)和不稳定型(1.0, 0.98, 0.99, 0.86, 0.92)。对于外部验证数据集,敏感性和特异性如下:移位型(0.83, 0.94)、外翻嵌插型(0.89, 0.90)、稳定型(0.88, 0.95)和不稳定型(0.85, 0.99)。外部数据集的总体均值(微观平均值和宏观平均值)分别为微观平均值(0.83(标准差0.05),0.96(标准差0.01))和宏观平均值(0.69(标准差0.02),0.95(标准差0.02))。
与单纯的人工诊断相比,我们的研究表明,所开发的模型显著提高了髋部骨折检测和分类的准确性。我们的模型在协助临床医生准确诊断和分类髋部骨折方面显示出巨大潜力。