Suppr超能文献

在解读肌肉骨骼放射学研究方面,人工智能是否优于自然智能?一项系统综述。

Does Artificial Intelligence Outperform Natural Intelligence in Interpreting Musculoskeletal Radiological Studies? A Systematic Review.

作者信息

Groot Olivier Q, Bongers Michiel E R, Ogink Paul T, Senders Joeky T, Karhade Aditya V, Bramer Jos A M, Verlaan Jorrit-Jan, Schwab Joseph H

机构信息

O. Q. Groot, M. E. R. Bongers, A. V. Karhade, J. H. Schwab, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

P. T. Ogink, J.-J. Verlaan, Department of Orthopaedic Surgery, University Medical Center Utrecht, Utrecht, the Netherlands.

出版信息

Clin Orthop Relat Res. 2020 Dec;478(12):2751-2764. doi: 10.1097/CORR.0000000000001360.

Abstract

BACKGROUND

Machine learning (ML) is a subdomain of artificial intelligence that enables computers to abstract patterns from data without explicit programming. A myriad of impactful ML applications already exists in orthopaedics ranging from predicting infections after surgery to diagnostic imaging. However, no systematic reviews that we know of have compared, in particular, the performance of ML models with that of clinicians in musculoskeletal imaging to provide an up-to-date summary regarding the extent of applying ML to imaging diagnoses. By doing so, this review delves into where current ML developments stand in aiding orthopaedists in assessing musculoskeletal images.

QUESTIONS/PURPOSES: This systematic review aimed (1) to compare performance of ML models versus clinicians in detecting, differentiating, or classifying orthopaedic abnormalities on imaging by (A) accuracy, sensitivity, and specificity, (B) input features (for example, plain radiographs, MRI scans, ultrasound), (C) clinician specialties, and (2) to compare the performance of clinician-aided versus unaided ML models.

METHODS

A systematic review was performed in PubMed, Embase, and the Cochrane Library for studies published up to October 1, 2019, using synonyms for machine learning and all potential orthopaedic specialties. We included all studies that compared ML models head-to-head against clinicians in the binary detection of abnormalities in musculoskeletal images. After screening 6531 studies, we ultimately included 12 studies. We conducted quality assessment using the Methodological Index for Non-randomized Studies (MINORS) checklist. All 12 studies were of comparable quality, and they all clearly included six of the eight critical appraisal items (study aim, input feature, ground truth, ML versus human comparison, performance metric, and ML model description). This justified summarizing the findings in a quantitative form by calculating the median absolute improvement of the ML models compared with clinicians for the following metrics of performance: accuracy, sensitivity, and specificity.

RESULTS

ML models provided, in aggregate, only very slight improvements in diagnostic accuracy and sensitivity compared with clinicians working alone and were on par in specificity (3% (interquartile range [IQR] -2.0% to 7.5%), 0.06% (IQR -0.03 to 0.14), and 0.00 (IQR -0.048 to 0.048), respectively). Inputs used by the ML models were plain radiographs (n = 8), MRI scans (n = 3), and ultrasound examinations (n = 1). Overall, ML models outperformed clinicians more when interpreting plain radiographs than when interpreting MRIs (17 of 34 and 3 of 16 performance comparisons, respectively). Orthopaedists and radiologists performed similarly to ML models, while ML models mostly outperformed other clinicians (outperformance in 7 of 19, 7 of 23, and 6 of 10 performance comparisons, respectively). Two studies evaluated the performance of clinicians aided and unaided by ML models; both demonstrated considerable improvements in ML-aided clinician performance by reporting a 47% decrease of misinterpretation rate (95% confidence interval [CI] 37 to 54; p < 0.001) and a mean increase in specificity of 0.048 (95% CI 0.029 to 0.068; p < 0.001) in detecting abnormalities on musculoskeletal images.

CONCLUSIONS

At present, ML models have comparable performance to clinicians in assessing musculoskeletal images. ML models may enhance the performance of clinicians as a technical supplement rather than as a replacement for clinical intelligence. Future ML-related studies should emphasize how ML models can complement clinicians, instead of determining the overall superiority of one versus the other. This can be accomplished by improving transparent reporting, diminishing bias, determining the feasibility of implantation in the clinical setting, and appropriately tempering conclusions.

LEVEL OF EVIDENCE

Level III, diagnostic study.

摘要

背景

机器学习(ML)是人工智能的一个子领域,它使计算机能够在无需明确编程的情况下从数据中抽象出模式。骨科领域已经存在大量有影响力的ML应用,从预测术后感染到诊断成像。然而,据我们所知,尚无系统评价比较ML模型与临床医生在肌肉骨骼成像方面的表现,以提供关于ML在成像诊断中应用程度的最新总结。通过这样做,本综述深入探讨了当前ML发展在协助骨科医生评估肌肉骨骼图像方面的现状。

问题/目的:本系统评价旨在(1)通过(A)准确性、敏感性和特异性、(B)输入特征(例如,平片、MRI扫描、超声)、(C)临床医生专业,比较ML模型与临床医生在成像上检测、区分或分类骨科异常的表现,以及(2)比较临床医生辅助与非辅助ML模型的表现。

方法

在PubMed、Embase和Cochrane图书馆中进行了一项系统评价,纳入截至2019年10月1日发表的研究,使用机器学习的同义词以及所有潜在的骨科专业。我们纳入了所有在肌肉骨骼图像异常的二元检测中直接比较ML模型与临床医生的研究。在筛选了6531项研究后,我们最终纳入了12项研究。我们使用非随机研究方法学指数(MINORS)清单进行质量评估。所有12项研究质量相当,并且都明确包括了八项关键评估项目中的六项(研究目的、输入特征、金标准、ML与人类比较、性能指标和ML模型描述)。这证明了通过计算ML模型与临床医生相比在以下性能指标上的中位绝对改善,以定量形式总结研究结果是合理的:准确性、敏感性和特异性。

结果

总体而言,与单独工作的临床医生相比,ML模型在诊断准确性和敏感性方面仅提供了非常轻微的改善,在特异性方面相当(分别为3%(四分位间距[IQR] -2.0%至7.5%)、0.06%(IQR -0.03至0.14)和0.00(IQR -0.048至0.048))。ML模型使用的输入是平片(n = 8)、MRI扫描(n = 3)和超声检查(n = 1)。总体而言,ML模型在解释平片时比解释MRI时更优于临床医生(分别为34项性能比较中的17项和16项性能比较中的3项)。骨科医生和放射科医生的表现与ML模型相似,而ML模型大多优于其他临床医生(分别在19项性能比较中的7项、23项性能比较中的7项和10项性能比较中的6项中表现更优)。两项研究评估了临床医生在有和没有ML模型辅助下的表现;两者均显示ML辅助临床医生的表现有显著改善,报告在检测肌肉骨骼图像异常时误判率降低47%(95%置信区间[CI] 37至54;p < 0.001),特异性平均增加0.048(95% CI 0.029至0.068;p < 0.001)。

结论

目前,ML模型在评估肌肉骨骼图像方面与临床医生表现相当。ML模型可以作为一种技术补充来提高临床医生的表现,而不是替代临床智能。未来与ML相关的研究应强调ML模型如何补充临床医生,而不是确定一方相对于另一方的总体优越性。这可以通过改进透明报告、减少偏差、确定在临床环境中植入的可行性以及适当地缓和结论来实现。

证据水平

III级,诊断性研究。

相似文献

3
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
8
Artificial Intelligence Solutions for Analysis of X-ray Images.人工智能在 X 射线图像分析中的应用。
Can Assoc Radiol J. 2021 Feb;72(1):60-72. doi: 10.1177/0846537120941671. Epub 2020 Aug 6.

引用本文的文献

2
ASSESSEMENT OF BONE AGE AGREEMENT BETWEEN THE SAUVEGRAIN AND GREULICH AND PYLE METHODS.索维格兰法与格-派法之间骨龄一致性的评估
Acta Ortop Bras. 2024 Oct 7;32(4):e278912. doi: 10.1590/1413-785220243204e278912. eCollection 2024.
5
An Overview of Machine Learning in Orthopedic Surgery: An Educational Paper.机器学习在骨科手术中的概述:一篇教育论文。
J Arthroplasty. 2023 Oct;38(10):1938-1942. doi: 10.1016/j.arth.2023.08.043. Epub 2023 Aug 19.
7
Advancements in Artificial Intelligence for Foot and Ankle Surgery: A Systematic Review.人工智能在足踝外科手术中的进展:一项系统综述。
Foot Ankle Orthop. 2023 Feb 13;8(1):24730114221151079. doi: 10.1177/24730114221151079. eCollection 2023 Jan.

本文引用的文献

1
Artificial Intelligence to Detect Papilledema from Ocular Fundus Photographs.人工智能检测眼底照片中的视乳头水肿。
N Engl J Med. 2020 Apr 30;382(18):1687-1695. doi: 10.1056/NEJMoa1917130. Epub 2020 Apr 14.
5
Rise of Robot Radiologists.机器人放射科医生的崛起
Nature. 2019 Dec;576(7787):S54-S58. doi: 10.1038/d41586-019-03847-z.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验