a Department of Orthopaedic Surgery and.
b Department of Dermatology , I-dermatology clinic, Seoul.
Acta Orthop. 2018 Aug;89(4):468-473. doi: 10.1080/17453674.2018.1453714. Epub 2018 Mar 26.
Background and purpose - We aimed to evaluate the ability of artificial intelligence (a deep learning algorithm) to detect and classify proximal humerus fractures using plain anteroposterior shoulder radiographs. Patients and methods - 1,891 images (1 image per person) of normal shoulders (n = 515) and 4 proximal humerus fracture types (greater tuberosity, 346; surgical neck, 514; 3-part, 269; 4-part, 247) classified by 3 specialists were evaluated. We trained a deep convolutional neural network (CNN) after augmentation of a training dataset. The ability of the CNN, as measured by top-1 accuracy, area under receiver operating characteristics curve (AUC), sensitivity/specificity, and Youden index, in comparison with humans (28 general physicians, 11 general orthopedists, and 19 orthopedists specialized in the shoulder) to detect and classify proximal humerus fractures was evaluated. Results - The CNN showed a high performance of 96% top-1 accuracy, 1.00 AUC, 0.99/0.97 sensitivity/specificity, and 0.97 Youden index for distinguishing normal shoulders from proximal humerus fractures. In addition, the CNN showed promising results with 65-86% top-1 accuracy, 0.90-0.98 AUC, 0.88/0.83-0.97/0.94 sensitivity/specificity, and 0.71-0.90 Youden index for classifying fracture type. When compared with the human groups, the CNN showed superior performance to that of general physicians and orthopedists, similar performance to orthopedists specialized in the shoulder, and the superior performance of the CNN was more marked in complex 3- and 4-part fractures. Interpretation - The use of artificial intelligence can accurately detect and classify proximal humerus fractures on plain shoulder AP radiographs. Further studies are necessary to determine the feasibility of applying artificial intelligence in the clinic and whether its use could improve care and outcomes compared with current orthopedic assessments.
背景与目的 - 我们旨在评估人工智能(深度学习算法)在使用普通前后位肩部 X 线片检测和分类肱骨近端骨折方面的能力。
患者与方法 - 评估了 1891 张图像(每人 1 张),包括正常肩部(n = 515)和 4 种肱骨近端骨折类型(大结节,346;外科颈,514;3 部分,269;4 部分,247),这些图像由 3 位专家分类。我们在增强训练数据集后训练了一个深度卷积神经网络(CNN)。然后,我们评估了 CNN 的能力,通过最高准确率、接受者操作特征曲线(AUC)下面积、敏感性/特异性和 Youden 指数来衡量,与人类(28 名普通医生、11 名普通骨科医生和 19 名专攻肩部的骨科医生)相比,检测和分类肱骨近端骨折的能力。
结果 - CNN 显示出出色的性能,最高准确率为 96%、AUC 为 1.00、敏感性/特异性为 0.99/0.97、Youden 指数为 0.97,用于区分正常肩部和肱骨近端骨折。此外,CNN 还显示出有前途的结果,最高准确率为 65-86%、AUC 为 0.90-0.98、敏感性/特异性为 0.88/0.83-0.97/0.94、Youden 指数为 0.71-0.90,用于骨折类型分类。与人类组相比,CNN 的表现优于普通医生和骨科医生,与专攻肩部的骨科医生的表现相似,并且在复杂的 3 部分和 4 部分骨折中,CNN 的表现更为显著。
解释 - 人工智能可以准确地在普通肩部前后位 X 线片上检测和分类肱骨近端骨折。需要进一步研究来确定在临床上应用人工智能的可行性,以及与当前骨科评估相比,其使用是否可以改善护理和结果。