Blackman Benjamin, Vivekanantha Prushoth, Mughal Rafay, Pareek Ayoosh, Bozzo Anthony, Samuelsson Kristian, de Sa Darren
School of Medicine, University of Limerick, Limerick, Ireland.
Division of Orthopaedic Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada.
BMC Musculoskelet Disord. 2025 Jan 4;26(1):16. doi: 10.1186/s12891-024-08228-w.
To summarize the statistical performance of machine learning in predicting revision, secondary knee injury, or reoperations following anterior cruciate ligament reconstruction (ACLR), and to provide a general overview of the statistical performance of these models.
Three online databases (PubMed, MEDLINE, EMBASE) were searched from database inception to February 6, 2024, to identify literature on the use of machine learning to predict revision, secondary knee injury (e.g. anterior cruciate ligament (ACL) or meniscus), or reoperation in ACLR. The authors adhered to the PRISMA and R-AMSTAR guidelines as well as the Cochrane Handbook for Systematic Reviews of Interventions. Demographic data and machine learning specifics were recorded. Model performance was recorded using discrimination, area under the curve (AUC), concordance, calibration, and Brier score. Factors deemed predictive for revision, secondary injury or reoperation were also extracted. The MINORS criteria were used for methodological quality assessment.
Nine studies comprising 125,427 patients with a mean follow-up of 5.82 (0.08-12.3) years were included in this review. Two of nine (22.2%) studies served as external validation analyses. Five (55.6%) studies reported on mean AUC (strongest model range 0.77-0.997). Four (44.4%) studies reported mean concordance (strongest model range: 0.67-0.713). Two studies reported on Brier score, calibration intercept, and calibration slope, with values ranging from 0.10 to 0.18, 0.0051-0.006, and 0.96-0.97 amongst highest performing models, respectively. Four studies reported calibration error, with all four studies demonstrating significant miscalibration at either two or five-year follow-ups amongst 10 of 14 models assessed.
Machine learning models designed to predict the risk of revision or secondary knee injury demonstrate variable discriminatory performance when evaluated with AUC or concordance metrics. Furthermore, there is variable calibration, with several models demonstrating evidence of miscalibration at two or five-year marks. The lack of external validation of existing models limits the generalizability of these findings. Future research should focus on validating current models in addition to developing new multimodal neural networks to improve accuracy and reliability.
总结机器学习在预测前交叉韧带重建(ACLR)后的翻修、二次膝关节损伤或再次手术方面的统计性能,并概述这些模型的统计性能。
检索了三个在线数据库(PubMed、MEDLINE、EMBASE),从数据库建立至2024年2月6日,以识别关于使用机器学习预测ACLR翻修、二次膝关节损伤(如前交叉韧带(ACL)或半月板损伤)或再次手术的文献。作者遵循PRISMA和R-AMSTAR指南以及Cochrane干预措施系统评价手册。记录人口统计学数据和机器学习细节。使用鉴别力、曲线下面积(AUC)、一致性、校准和Brier评分记录模型性能。还提取了被认为对翻修、二次损伤或再次手术具有预测性的因素。采用MINORS标准进行方法学质量评估。
本综述纳入了9项研究,共125427例患者,平均随访5.82(0.08 - 12.3)年。9项研究中有2项(22.2%)用作外部验证分析。5项(55.6%)研究报告了平均AUC(最强模型范围为0.77 - 0.997)。4项(44.4%)研究报告了平均一致性(最强模型范围为0.67 - 0.713)。两项研究报告了Brier评分、校准截距和校准斜率,在表现最佳的模型中,其值分别为0.10至0.18、0.0051 - 0.006和0.96 - 0.97。4项研究报告了校准误差,在评估的14个模型中的10个模型中,所有4项研究均显示在两年或五年随访时有显著的校准错误。
旨在预测翻修风险或二次膝关节损伤的机器学习模型,在用AUC或一致性指标评估时,表现出不同的鉴别性能。此外,校准情况也各不相同,有几个模型在两年或五年标记时显示出校准错误的证据。现有模型缺乏外部验证限制了这些发现的普遍性。未来的研究除了开发新的多模态神经网络以提高准确性和可靠性外,还应专注于验证当前模型。