Leckey Christopher, van Dyk Nicol, Doherty Cailbhe, Lawlor Aonghus, Delahunt Eamonn
School of Public Health Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland
High Performance Unit, Irish Rugby Football Union, Dublin, Dublin, Ireland.
Br J Sports Med. 2025 Mar 25;59(7):491-500. doi: 10.1136/bjsports-2024-108576.
This study reviewed the current state of machine learning (ML) research for the prediction of sports-related injuries. It aimed to chart the various approaches used and assess their efficacy, considering factors such as data heterogeneity, model specificity and contextual factors when developing predictive models.
Scoping review.
PubMed, EMBASE, SportDiscus and IEEEXplore.
In total, 1241 studies were identified, 58 full texts were screened, and 38 relevant studies were reviewed and charted. Football (soccer) was the most commonly investigated sport. Area under the curve (AUC) was the most common means of model evaluation; it was reported in 71% of studies. In 60% of studies, tree-based solutions provided the highest statistical predictive performance. Random Forest and Extreme Gradient Boosting (XGBoost) were found to provide the highest performance for injury risk prediction. Logistic regression outperformed ML methods in 4 out of 12 studies. Three studies reported model performance of AUC>0.9, yet the clinical relevance is questionable.
A variety of different ML models have been applied to the prediction of sports-related injuries. While several studies report strong predictive performance, their clinical utility can be limited, with wide prediction windows or broad definitions of injury. The efficacy of ML is hampered by small datasets and numerous methodological heterogeneities (cohort sizes, definition of injury and dependent variables), which were common across the reviewed studies.
本研究回顾了用于预测运动相关损伤的机器学习(ML)研究现状。旨在梳理所采用的各种方法,并在开发预测模型时考虑数据异质性、模型特异性和背景因素等因素,评估其有效性。
范围综述。
PubMed、EMBASE、SportDiscus和IEEEXplore。
共识别出1241项研究,筛选了58篇全文,对38项相关研究进行了综述和梳理。足球是研究最频繁的运动项目。曲线下面积(AUC)是最常用的模型评估方法;71%的研究报告了该方法。在60%的研究中,基于树的解决方案具有最高的统计预测性能。随机森林和极端梯度提升(XGBoost)在损伤风险预测方面表现最佳。在12项研究中的4项中,逻辑回归的表现优于ML方法。三项研究报告的模型AUC性能>0.9,但其临床相关性存疑。
多种不同的ML模型已应用于运动相关损伤的预测。虽然有几项研究报告了较强的预测性能,但由于预测窗口宽或损伤定义宽泛,其临床效用可能有限。ML的有效性受到小数据集和众多方法异质性(队列规模、损伤定义和因变量)的阻碍,这些在综述研究中很常见。