Lans Amanda, Pierik Robertus J B, Bales John R, Fourman Mitchell S, Shin David, Kanbier Laura N, Rifkin Jack, DiGiovanni William H, Chopra Rohan R, Moeinzad Rana, Verlaan Jorrit-Jan, Schwab Joseph H
Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital - Harvard Medical School, 55 Fruit Street, Boston, MA 02114, United States of America; Department of Orthopaedic Surgery, University Medical Center Utrecht - Utrecht University, Heidelberglaan 100, 3584, CX, Utrecht, the Netherlands.
Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital - Harvard Medical School, 55 Fruit Street, Boston, MA 02114, United States of America.
Artif Intell Med. 2022 Oct;132:102396. doi: 10.1016/j.artmed.2022.102396. Epub 2022 Sep 6.
Machine learning (ML) models are emerging at a rapid pace in orthopaedic imaging due to their ability to facilitate timely diagnostic and treatment decision making. However, despite a considerable increase in model development and ML-related publications, there has been little evaluation regarding the quality of these studies. In order to successfully move forward with the implementation of ML models for diagnostic imaging in orthopaedics, it is imperative that we ensure models are held at a high standard and provide applicable, reliable and accurate results. Multiple reporting guidelines have been developed to help authors and reviewers of ML models, such as the Checklist for AI in Medical Imaging (CLAIM) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Previous investigations of prognostic orthopaedic ML models have reported concerns with regard to the rate of transparent reporting. Therefore, an assessment of whether ML models for diagnostic imaging in orthopaedics adequately and clearly report essential facets of their model development is warranted.
To evaluate (1) the completeness of the CLAIM checklist and (2) the risk of bias according to the QUADAS-2 tool for ML-based orthopaedic diagnostic imaging models. This study sought to identify ML details that researchers commonly fail to report and to provide recommendations to improve reporting standards for diagnostic imaging ML models.
A systematic review was performed to identify ML-based diagnostic imaging models in orthopaedic surgery. Articles published within the last 5 years were included. Two reviewers independently extracted data using the CLAIM checklist and QUADAS-2 tool, and discrepancies were resolved by discussion with at least two additional reviewers.
After screening 7507 articles, 91 met the study criteria. The mean completeness of CLAIM items was 63 % (SD ± 28 %). Among the worst reported CLAIM items were item 28 (metrics of model performance), item 13 (the handling of missing data) and item 9 (data preprocessing steps), with only 2 % (2/91), 8 % (7/91) and 13 % (12/91) of studies correctly reporting these items, respectively. The QUADAS-2 tool revealed that the patient selection domain was at the highest risk of bias: 18 % (16/91) of studies were at high risk of bias and 32 % (29/91) had an unknown risk of bias.
This review demonstrates that the reporting of relevant information, such as handling missing data and data preprocessing steps, by diagnostic ML studies for orthopaedic imaging studies is limited. Additionally, a substantial number of works were at high risk of bias. Future studies describing ML-based models for diagnostic imaging should adhere to acknowledged methodological standards to maximize the quality and applicability of their models.
机器学习(ML)模型在骨科影像学中迅速兴起,因为它们能够促进及时的诊断和治疗决策。然而,尽管模型开发和与ML相关的出版物数量大幅增加,但对这些研究的质量评估却很少。为了成功推进ML模型在骨科诊断成像中的应用,我们必须确保模型达到高标准,并提供适用、可靠和准确的结果。已经制定了多个报告指南来帮助ML模型的作者和审稿人,例如医学影像人工智能清单(CLAIM)和诊断准确性研究质量评估(QUADAS-2)工具。先前对骨科预后ML模型的调查报告了对透明报告率的担忧。因此,有必要评估骨科诊断成像的ML模型是否充分且清晰地报告其模型开发的基本方面。
评估(1)CLAIM清单的完整性,以及(2)基于ML的骨科诊断成像模型根据QUADAS-2工具的偏倚风险。本研究旨在确定研究人员通常未报告的ML细节,并提供建议以提高诊断成像ML模型的报告标准。
进行了一项系统综述,以识别骨科手术中基于ML的诊断成像模型。纳入过去5年内发表的文章。两名审稿人使用CLAIM清单和QUADAS-2工具独立提取数据,差异通过与至少另外两名审稿人讨论解决。
在筛选了7507篇文章后,91篇符合研究标准。CLAIM项目的平均完整性为63%(标准差±28%)。报告最差的CLAIM项目包括项目28(模型性能指标)、项目13(缺失数据的处理)和项目9(数据预处理步骤),分别只有2%(2/91)、8%(7/91)和13%(12/91)的研究正确报告了这些项目。QUADAS-2工具显示,患者选择领域的偏倚风险最高:18%(16/91)的研究存在高偏倚风险,32%(29/91)的研究偏倚风险未知。
本综述表明,骨科成像研究的诊断ML研究在报告相关信息(如处理缺失数据和数据预处理步骤)方面存在局限性。此外,大量研究存在高偏倚风险。未来描述基于ML的诊断成像模型的研究应遵循公认的方法标准,以最大限度地提高其模型的质量和适用性。