Department of Orthopaedic Surgery, Groningen University Medical Centre, Groningen, The Netherlands.
Department of Surgery, Groningen University Medical Centre, Groningen, The Netherlands.
Eur J Trauma Emerg Surg. 2023 Apr;49(2):1057-1069. doi: 10.1007/s00068-022-02136-1. Epub 2022 Nov 14.
Convolutional neural networks (CNNs) are increasingly being developed for automated fracture detection in orthopaedic trauma surgery. Studies to date, however, are limited to providing classification based on the entire image-and only produce heatmaps for approximate fracture localization instead of delineating exact fracture morphology. Therefore, we aimed to answer (1) what is the performance of a CNN that detects, classifies, localizes, and segments an ankle fracture, and (2) would this be externally valid?
The training set included 326 isolated fibula fractures and 423 non-fracture radiographs. The Detectron2 implementation of the Mask R-CNN was trained with labelled and annotated radiographs. The internal validation (or 'test set') and external validation sets consisted of 300 and 334 radiographs, respectively. Consensus agreement between three experienced fellowship-trained trauma surgeons was defined as the ground truth label. Diagnostic accuracy and area under the receiver operator characteristic curve (AUC) were used to assess classification performance. The Intersection over Union (IoU) was used to quantify accuracy of the segmentation predictions by the CNN, where a value of 0.5 is generally considered an adequate segmentation.
The final CNN was able to classify fibula fractures according to four classes (Danis-Weber A, B, C and No Fracture) with AUC values ranging from 0.93 to 0.99. Diagnostic accuracy was 89% on the test set with average sensitivity of 89% and specificity of 96%. External validity was 89-90% accurate on a set of radiographs from a different hospital. Accuracies/AUCs observed were 100/0.99 for the 'No Fracture' class, 92/0.99 for 'Weber B', 88/0.93 for 'Weber C', and 76/0.97 for 'Weber A'. For the fracture bounding box prediction by the CNN, a mean IoU of 0.65 (SD ± 0.16) was observed. The fracture segmentation predictions by the CNN resulted in a mean IoU of 0.47 (SD ± 0.17).
This study presents a look into the 'black box' of CNNs and represents the first automated delineation (segmentation) of fracture lines on (ankle) radiographs. The AUC values presented in this paper indicate good discriminatory capability of the CNN and substantiate further study of CNNs in detecting and classifying ankle fractures.
II, Diagnostic imaging study.
卷积神经网络(CNN)越来越多地被开发用于骨科创伤手术中的自动骨折检测。然而,迄今为止的研究仅限于基于整个图像进行分类,并且仅生成近似骨折定位的热图,而不是描绘确切的骨折形态。因此,我们旨在回答(1)检测、分类、定位和分割踝关节骨折的 CNN 的性能如何,以及(2)它是否具有外部有效性?
训练集包括 326 例孤立性腓骨骨折和 423 例非骨折 X 线片。使用标记和注释的 X 线片训练了 Detectron2 实现的 Mask R-CNN。内部验证(或“测试集”)和外部验证集分别包括 300 张和 334 张 X 线片。三位经验丰富的 fellowship 培训创伤外科医生之间的共识协议被定义为地面真实标签。诊断准确性和接收器操作特征曲线下的面积(AUC)用于评估分类性能。CNN 的分割预测的交并比(IoU)用于量化准确性,其中值为 0.5 通常被认为是足够的分割。
最终的 CNN 能够根据四个类别(Danis-Weber A、B、C 和无骨折)对腓骨骨折进行分类,AUC 值范围为 0.93 至 0.99。在测试集上的诊断准确性为 89%,平均敏感性为 89%,特异性为 96%。在来自另一家医院的一组 X 线片上,外部有效性的准确率为 89-90%。对于“无骨折”类别,准确性/AUC 为 100/0.99,对于“Weber B”为 92/0.99,对于“ Weber C”为 88/0.93,对于“ Weber A”为 76/0.97。对于 CNN 预测的骨折边界框,观察到平均 IoU 为 0.65(SD ± 0.16)。CNN 预测的骨折分割导致平均 IoU 为 0.47(SD ± 0.17)。
本研究深入探讨了 CNN 的“黑箱”,并代表了首次对(踝关节)X 线片上的骨折线进行自动描绘(分割)。本文提出的 AUC 值表明 CNN 具有良好的区分能力,并证实了 CNN 在检测和分类踝关节骨折方面的进一步研究。
II,诊断影像学研究。