深度学习在检测桡骨隐匿性和显性骨折方面与人类观察者相当吗？

Is Deep Learning On Par with Human Observers for Detection of Radiographically Visible and Occult Fractures of the Scaphoid?

机构信息

D. W. G. Langerhuizen, S. J. Janssen, G. M. M. J. Kerkhoffs, Department of Orthopaedic Surgery, Amsterdam Movement Sciences (AMS), Amsterdam University Medical Centre, Amsterdam, The Netherlands.

A. E. J. Bulstra, R. L. Jaarsma, J. N. Doornberg, Flinders University, Department of Orthopaedic & Trauma Surgery, Flinders Medical Centre, Adelaide, Australia.

出版信息

Clin Orthop Relat Res. 2020 Nov;478(11):2653-2659. doi: 10.1097/CORR.0000000000001318.

DOI:10.1097/CORR.0000000000001318

PMID:32452927

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7571968/

Abstract

BACKGROUND

Preliminary experience suggests that deep learning algorithms are nearly as good as humans in detecting common, displaced, and relatively obvious fractures (such as, distal radius or hip fractures). However, it is not known whether this also is true for subtle or relatively nondisplaced fractures that are often difficult to see on radiographs, such as scaphoid fractures.

QUESTIONS/PURPOSES: (1) What is the diagnostic accuracy, sensitivity, and specificity of a deep learning algorithm in detecting radiographically visible and occult scaphoid fractures using four radiographic imaging views? (2) Does adding patient demographic (age and sex) information improve the diagnostic performance of the deep learning algorithm? (3) Are orthopaedic surgeons better at diagnostic accuracy, sensitivity, and specificity compared with deep learning? (4) What is the interobserver reliability among five human observers and between human consensus and deep learning algorithm?

METHODS

We retrospectively searched the picture archiving and communication system (PACS) to identify 300 patients with a radiographic scaphoid series, until we had 150 fractures (127 visible on radiographs and 23 only visible on MRI) and 150 non-fractures with a corresponding CT or MRI as the reference standard for fracture diagnosis. At our institution, MRIs are usually ordered for patients with scaphoid tenderness and normal radiographs, and a CT with radiographically visible scaphoid fracture. We used a deep learning algorithm (a convolutional neural network [CNN]) for automated fracture detection on radiographs. Deep learning, an advanced subset of artificial intelligence, combines artificial neuronal layers to resemble a neuron cell. CNNs-essentially deep learning algorithms resembling interconnected neurons in the human brain-are most commonly used for image analysis. Area under the receiver operating characteristic curve (AUC) was used to evaluate the algorithm's diagnostic performance. An AUC of 1.0 would indicate perfect prediction, whereas 0.5 would indicate that a prediction is no better than a flip of a coin. The probability of a scaphoid fracture generated by the CNN, sex, and age were included in a multivariable logistic regression to determine whether this would improve the algorithm's diagnostic performance. Diagnostic performance characteristics (accuracy, sensitivity, and specificity) and reliability (kappa statistic) were calculated for the CNN and for the five orthopaedic surgeon observers in our study.

RESULTS

The algorithm had an AUC of 0.77 (95% CI 0.66 to 0.85), 72% accuracy (95% CI 60% to 84%), 84% sensitivity (95% CI 0.74 to 0.94), and 60% specificity (95% CI 0.46 to 0.74). Adding age and sex did not improve diagnostic performance (AUC 0.81 [95% CI 0.73 to 0.89]). Orthopaedic surgeons had better specificity (0.93 [95% CI 0.93 to 0.99]; p < 0.01), while accuracy (84% [95% CI 81% to 88%]) and sensitivity (0.76 [95% CI 0.70 to 0.82]; p = 0.29) did not differ between the algorithm and human observers. Although the CNN was less specific in diagnosing relatively obvious fractures, it detected five of six occult scaphoid fractures that were missed by all human observers. The interobserver reliability among the five surgeons was substantial (Fleiss' kappa = 0.74 [95% CI 0.66 to 0.83]), but the reliability between the algorithm and human observers was only fair (Cohen's kappa = 0.34 [95% CI 0.17 to 0.50]).

CONCLUSIONS

Initial experience with our deep learning algorithm suggests that it has trouble identifying scaphoid fractures that are obvious to human observers. Thirteen false positive suggestions were made by the CNN, which were correctly detected by the five surgeons. Research with larger datasets-preferably also including information from physical examination-or further algorithm refinement is merited.

LEVEL OF EVIDENCE

Level III, diagnostic study.

摘要

背景

初步经验表明，深度学习算法在检测常见、移位和相对明显的骨折（如桡骨远端或髋部骨折）方面与人类一样出色。然而，对于在 X 光片上通常难以看到的细微或相对无移位的骨折，例如舟状骨骨折，这种情况是否也是如此，目前还不得而知。

问题/目的：（1）深度学习算法在使用四种影像学视图检测 X 光片可见和隐匿性舟状骨骨折时的诊断准确性、敏感度和特异性如何？（2）添加患者的人口统计学信息（年龄和性别）是否会提高深度学习算法的诊断性能？（3）与深度学习算法相比，骨科医生的诊断准确性、敏感度和特异性是否更高？（4）五位人类观察者之间以及人类共识与深度学习算法之间的观察者间可靠性如何？

方法

我们回顾性地在图像存档与通信系统（PACS）中搜索，以确定有 300 名患者的 X 射线舟状骨系列，直到我们有 150 例骨折（127 例在 X 光片上可见，23 例仅在 MRI 上可见）和 150 例无骨折，相应的 CT 或 MRI 作为骨折诊断的参考标准。在我们的机构中，通常在 X 光片正常且舟状骨有压痛的情况下为患者开 MRI，在 X 光片可见舟状骨骨折的情况下开 CT。我们使用深度学习算法（卷积神经网络[CNN]）对 X 光片进行自动骨折检测。深度学习是人工智能的一个高级子集，它结合了人工神经元层，以模拟神经元细胞。CNN-本质上是类似于人脑神经元的深度学习算法-最常用于图像分析。接受者操作特征曲线下的面积（AUC）用于评估算法的诊断性能。AUC 为 1.0 表示完美预测，而 0.5 表示预测并不比掷硬币好。CNN 生成的舟状骨骨折的概率、性别和年龄被包含在多变量逻辑回归中，以确定这是否会提高算法的诊断性能。为 CNN 和我们研究中的五位骨科医生观察者计算了诊断性能特征（准确性、敏感度和特异性）和可靠性（kappa 统计量）。

结果

该算法的 AUC 为 0.77（95%CI 0.66 至 0.85），准确率为 72%（95%CI 60%至 84%），敏感度为 84%（95%CI 0.74 至 0.94），特异性为 60%（95%CI 0.46 至 0.74）。添加年龄和性别并不能提高诊断性能（AUC 0.81[95%CI 0.73 至 0.89]）。骨科医生的特异性更好（0.93[95%CI 0.93 至 0.99]；p<0.01），而准确性（84%[95%CI 81% 至 88%]）和敏感度（0.76[95%CI 0.70 至 0.82]；p=0.29）与人类观察者之间没有差异。尽管该 CNN 在诊断明显骨折方面的特异性较低，但它检测到了所有五位人类观察者都遗漏的六例隐匿性舟状骨骨折。五位外科医生之间的观察者间可靠性较高（Fleiss'kappa=0.74[95%CI 0.66 至 0.83]），但算法与人类观察者之间的可靠性仅为中等（Cohen's kappa=0.34[95%CI 0.17 至 0.50]）。

结论

我们对深度学习算法的初步经验表明，它很难识别对人类观察者来说明显的舟状骨骨折。CNN 提出了 13 个假阳性建议，这被五位外科医生正确检测到。需要进行更大数据集的研究-最好还包括体格检查信息-或进一步改进算法。

证据水平

III 级，诊断研究。

相似文献

Is Deep Learning On Par with Human Observers for Detection of Radiographically Visible and Occult Fractures of the Scaphoid?

Clin Orthop Relat Res. 2020 Nov;478(11):2653-2659. doi: 10.1097/CORR.0000000000001318.

Can a Deep Learning Algorithm Improve Detection of Occult Scaphoid Fractures in Plain Radiographs? A Clinical Validation Study.

Clin Orthop Relat Res. 2023 Sep 1;481(9):1828-1835. doi: 10.1097/CORR.0000000000002612. Epub 2023 Mar 7.

Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists?

Clin Orthop Relat Res. 2021 Jul 1;479(7):1598-1612. doi: 10.1097/CORR.0000000000001685.

Development and Validation of a Deep Learning Model Using Convolutional Neural Networks to Identify Scaphoid Fractures in Radiographs.

JAMA Netw Open. 2021 May 3;4(5):e216096. doi: 10.1001/jamanetworkopen.2021.6096.

Musculoskeletal radiologist-level performance by using deep learning for detection of scaphoid fractures on conventional multi-view radiographs of hand and wrist.

Eur Radiol. 2023 Mar;33(3):1575-1588. doi: 10.1007/s00330-022-09205-4. Epub 2022 Nov 15.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Evaluation of an artificial intelligence system for diagnosing scaphoid fracture on direct radiography.

Eur J Trauma Emerg Surg. 2022 Feb;48(1):585-592. doi: 10.1007/s00068-020-01468-0. Epub 2020 Aug 30.

Deep Learning Assistance Closes the Accuracy Gap in Fracture Detection Across Clinician Types.

Clin Orthop Relat Res. 2023 Mar 1;481(3):580-588. doi: 10.1097/CORR.0000000000002385. Epub 2022 Sep 9.

Artificial intelligence for X-ray scaphoid fracture detection: a systematic review and diagnostic test accuracy meta-analysis.

Eur Radiol. 2024 Jul;34(7):4341-4351. doi: 10.1007/s00330-023-10473-x. Epub 2023 Dec 15.

6-week radiographs unsuitable for diagnosis of suspected scaphoid fractures.

Arch Orthop Trauma Surg. 2016 Jun;136(6):771-8. doi: 10.1007/s00402-016-2438-4. Epub 2016 Mar 30.

引用本文的文献

Application of artificial intelligence in the diagnosis of scaphoid fractures: impact of automated detection of scaphoid fractures in a real-life study.

Radiol Med. 2025 Aug 23. doi: 10.1007/s11547-025-02028-5.

Addressing fractures that are hard to diagnose on imaging: Radiomics or deep learning?

Radiol Med. 2025 Aug 7. doi: 10.1007/s11547-025-02051-6.

Artificial intelligence in orthopedics: fundamentals, current applications, and future perspectives.

Mil Med Res. 2025 Aug 4;12(1):42. doi: 10.1186/s40779-025-00633-z.

Deep Learning in Scaphoid Nonunion Treatment.

J Clin Med. 2025 Mar 9;14(6):1850. doi: 10.3390/jcm14061850.

Artificial intelligence and machine learning capabilities in the detection of acute scaphoid fracture: a critical review.

J Hand Surg Eur Vol. 2025 Jan 23;50(8):17531934241312896. doi: 10.1177/17531934241312896.

Texture analysis combined with machine learning in radiographs of the knee joint: potential to identify tibial plateau occult fractures.

Quant Imaging Med Surg. 2025 Jan 2;15(1):502-514. doi: 10.21037/qims-24-799. Epub 2024 Dec 16.

An open source convolutional neural network to detect and localize distal radius fractures on plain radiographs.

Eur J Trauma Emerg Surg. 2025 Jan 17;51(1):26. doi: 10.1007/s00068-024-02731-4.

Trends in the Use of Weightbearing Computed Tomography.

J Clin Med. 2024 Sep 18;13(18):5519. doi: 10.3390/jcm13185519.

Artificial intelligence in musculoskeletal imaging: realistic clinical applications in the next decade.

Skeletal Radiol. 2024 Sep;53(9):1849-1868. doi: 10.1007/s00256-024-04684-6. Epub 2024 Jun 20.

Optimizing the Clinical Direction of Artificial Intelligence With Health Policy: A Narrative Review of the Literature.

Cureus. 2024 Apr 16;16(4):e58400. doi: 10.7759/cureus.58400. eCollection 2024 Apr.

本文引用的文献

What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review.

Clin Orthop Relat Res. 2019 Nov;477(11):2482-2491. doi: 10.1097/CORR.0000000000000848.

Artificial intelligence detection of distal radius fractures: a comparison between the convolutional neural network and professional assessments.

Acta Orthop. 2019 Aug;90(4):394-400. doi: 10.1080/17453674.2019.1600125. Epub 2019 Apr 3.

Deep neural network improves fracture detection by clinicians.

Proc Natl Acad Sci U S A. 2018 Nov 6;115(45):11591-11596. doi: 10.1073/pnas.1806905115. Epub 2018 Oct 22.

Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study.

Lancet. 2018 Dec 1;392(10162):2388-2396. doi: 10.1016/S0140-6736(18)31645-3. Epub 2018 Oct 11.

Detecting intertrochanteric hip fractures with orthopedist-level accuracy using a deep convolutional neural network.

Skeletal Radiol. 2019 Feb;48(2):239-244. doi: 10.1007/s00256-018-3016-3. Epub 2018 Jun 28.

Automated detection and classification of the proximal humerus fracture by using deep learning algorithm.

Acta Orthop. 2018 Aug;89(4):468-473. doi: 10.1080/17453674.2018.1453714. Epub 2018 Mar 26.

Artificial intelligence in fracture detection: transfer learning from deep convolutional neural networks.

Clin Radiol. 2018 May;73(5):439-445. doi: 10.1016/j.crad.2017.11.015. Epub 2017 Dec 18.

Artificial intelligence for analyzing orthopedic trauma radiographs.

Acta Orthop. 2017 Dec;88(6):581-586. doi: 10.1080/17453674.2017.1344459. Epub 2017 Jul 6.

Dermatologist-level classification of skin cancer with deep neural networks.

Nature. 2017 Feb 2;542(7639):115-118. doi: 10.1038/nature21056. Epub 2017 Jan 25.

Predictors of fracture following suspected injury to the scaphoid.

J Bone Joint Surg Br. 2012 Jul;94(7):961-8. doi: 10.1302/0301-620X.94B7.28704.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

深度学习在检测桡骨隐匿性和显性骨折方面与人类观察者相当吗？

Is Deep Learning On Par with Human Observers for Detection of Radiographically Visible and Occult Fractures of the Scaphoid?

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

LEVEL OF EVIDENCE

背景

方法

结果

结论

证据水平

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献