Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;
Department of Orthopedic Surgery, University Medical Center Utrecht, Utrecht University, The Netherlands.
Acta Orthop. 2021 Aug;92(4):385-393. doi: 10.1080/17453674.2021.1910448. Epub 2021 Apr 18.
Background and purpose - External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines.Material and methods - We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting.Results - We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43-89), with 6 items being reported in less than 4/18 of the studies.Interpretation - Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.
背景与目的-机器学习(ML)预测模型的外部验证是临床应用前的重要步骤。我们使用个体预后或诊断透明报告(TRIPOD)指南,评估了骨科中经过外部验证的 ML 预测模型的比例、性能和透明报告。
材料与方法-我们使用骨科各专业、ML 和外部验证的同义词进行了系统搜索。此前,我们小组已确定了截至 2020 年 6 月 18 日,骨科手术结果中仅具有内部验证的 59 个 ML 预测模型,其比例通过使用这些模型来确定。使用判别分析、校准和决策曲线分析评估模型性能。TRIPOD 指南评估了透明报告。
结果-经过筛选 4682 项研究后,我们纳入了 18 项研究,对 59 个 ML 模型中的 10 个不同的 ML 预测模型进行了外部验证。在本综述中确定的所有外部验证均保留了良好的判别力。仅在 3 项研究中提供了其他关键性能指标,因此难以进行整体性能评估。总体 TRIPOD 完整性中位数为 61%(IQR 43-89),6 项指标在少于 18 项研究中的 4/18 中报告。
解释-目前大多数预测性 ML 模型都没有经过外部验证。18 项可用的外部验证研究的特点是性能指标报告不完整,限制了对模型性能的透明检查。需要进一步进行前瞻性研究,以验证或反驳骨科中众多预测性 ML 模型,同时遵循现有的指南。这可确保临床医生能够充分利用经过验证且可临床实施的 ML 决策工具。