Oude Nijhuis Koen D, Prijs Jasper, Barvelink Britt, van Luit Hans, Zhao Yang, Liao Zhibin, Jaarsma Ruurd L, IJpma Frank F A, Wijffels Mathieu M E, Doornberg Job N, Colaris Joost W
Department of Orthopedic Surgery, University Medical Centre Groningen and Groningen University, Groningen, The Netherlands.
Department of Trauma Surgery, University Medical Centre Groningen and Groningen University, Hanzeplein 1, 9713PZ, Groningen, The Netherlands.
Eur J Trauma Emerg Surg. 2025 Jul 21;51(1):261. doi: 10.1007/s00068-025-02931-6.
PURPOSE: Convolutional Neural Networks (CNNs) have shown promise in fracture detection, but their ability to improve surgeons' inconsistent fracture classification remains unstudied. Therefore, our aim was create and (externally) validate the performance of an open-source CNN algorithm to classify DRFs according to the AO/OTA classification system? METHODS: Patients with postero-anterior, lateral and oblique radiographs were included. Radiographs were classified according to the AO/OTA-classification and were used to train a CNN algorithm. The algorithm was tested on an internal and external validation set (two other level 1 trauma centers), with the DRFs classified by three independent surgeons. RESULTS: 659 radiographs were used to train the algorithm. Internal- and external validation sets contained 190 and 188 patients, respectively. Upon internal validation, the CNN had an accuracy of 62% and an area under receiving operating characteristic curve (AUC) of 0.63-0.93 (type 2R3A 0.84, type 2R3B 0.63, type 2R3C 0.75, and no DRF 0.93). On the external validation, the algorithm has an accuracy of 61% and an AUC of 0.56-0.88 (type 2R3A 0.82, type 2R3B 0.56, type 2R3C 0.75, and no DRF 0.88). CONCLUSION: The presented algorithm has demonstrated excellent accuracy in classifying type 2R3A DRFs and excluding DRFs. However, poor to moderate accuracy is observed in classifying 2R3B and 2R3C DRFs according to the AO/OTA system, similar to limited surgeons' inter-observer agreement. These results show that despite previous excellence in fracture detection, CNN-algorithms struggle with classifying; potentially showing the inherent problems with these classification systems.
目的:卷积神经网络(CNN)在骨折检测方面已显示出前景,但其改善外科医生不一致的骨折分类的能力尚未得到研究。因此,我们的目标是创建并(外部)验证一种开源CNN算法根据AO/OTA分类系统对干骺端骨折(DRF)进行分类的性能。 方法:纳入有正位、侧位和斜位X线片的患者。X线片根据AO/OTA分类进行分类,并用于训练CNN算法。该算法在内部和外部验证集(另外两个一级创伤中心)上进行测试,DRF由三位独立的外科医生进行分类。 结果:659张X线片用于训练该算法。内部和外部验证集分别包含190例和188例患者。在内部验证中,CNN的准确率为62%,接受操作特征曲线(AUC)下面积为0.63 - 0.93(2R3A型0.84,2R3B型0.63,2R3C型0.75,无DRF为0.93)。在外部验证中,该算法的准确率为61%,AUC为0.56 - 。 结论:所提出的算法在对2R3A型DRF进行分类和排除DRF方面已显示出优异的准确性。然而,根据AO/OTA系统对2R3B和2R3C型DRF进行分类时,观察到准确性较差至中等,类似于外科医生之间有限的观察者间一致性。这些结果表明,尽管CNN算法在骨折检测方面先前表现出色,但在分类方面仍存在困难;这可能显示了这些分类系统存在的固有问题。 (原文此处外部验证的AUC未完整给出数据)
Eur J Trauma Emerg Surg. 2024-12
Eur J Trauma Emerg Surg. 2025-1-17
Eur J Trauma Emerg Surg. 2025-1-17
Radiol Artif Intell. 2020-3-25