Luu Bryan C, Wright Audrey L, Haeberle Heather S, Karnuta Jaret M, Schickendantz Mark S, Makhni Eric C, Nwachukwu Benedict U, Williams Riley J, Ramkumar Prem N
Department of Orthopedic Surgery, Baylor College of Medicine, Houston, Texas, USA.
Machine Learning Orthopaedics Lab, Cleveland Clinic, Cleveland, Ohio, USA.
Orthop J Sports Med. 2020 Sep 25;8(9):2325967120953404. doi: 10.1177/2325967120953404. eCollection 2020 Sep.
The opportunity to quantitatively predict next-season injury risk in the National Hockey League (NHL) has become a reality with the advent of advanced computational processors and machine learning (ML) architecture. Unlike static regression analyses that provide a momentary prediction, ML algorithms are dynamic in that they are readily capable of imbibing historical data to build a framework that improves with additive data.
To (1) characterize the epidemiology of publicly reported NHL injuries from 2007 to 2017, (2) determine the validity of a machine learning model in predicting next-season injury risk for both goalies and position players, and (3) compare the performance of modern ML algorithms versus logistic regression (LR) analyses.
Descriptive epidemiology study.
Professional NHL player data were compiled for the years 2007 to 2017 from 2 publicly reported databases in the absence of an official NHL-approved database. Attributes acquired from each NHL player from each professional year included age, 85 performance metrics, and injury history. A total of 5 ML algorithms were created for both position player and goalie data: random forest, K Nearest Neighbors, Naïve Bayes, XGBoost, and Top 3 Ensemble. LR was also performed for both position player and goalie data. Area under the receiver operating characteristic curve (AUC) primarily determined validation.
Player data were generated from 2109 position players and 213 goalies. For models predicting next-season injury risk for position players, XGBoost performed the best with an AUC of 0.948, compared with an AUC of 0.937 for LR ( < .0001). For models predicting next-season injury risk for goalies, XGBoost had the highest AUC with 0.956, compared with an AUC of 0.947 for LR ( < .0001).
Advanced ML models such as XGBoost outperformed LR and demonstrated good to excellent capability of predicting whether a publicly reportable injury is likely to occur the next season.
随着先进计算处理器和机器学习(ML)架构的出现,在国家冰球联盟(NHL)中定量预测下赛季伤病风险已成为现实。与提供瞬时预测的静态回归分析不同,ML算法是动态的,因为它们能够轻松吸收历史数据以构建一个随着附加数据而改进的框架。
(1)描述2007年至2017年公开报告的NHL伤病的流行病学特征,(2)确定机器学习模型在预测守门员和场上球员下赛季伤病风险方面的有效性,以及(3)比较现代ML算法与逻辑回归(LR)分析的性能。
描述性流行病学研究。
在没有NHL官方批准数据库的情况下,从2个公开报告的数据库中收集了2007年至2017年的NHL职业球员数据。从每个职业赛季的每个NHL球员获得的属性包括年龄、85项表现指标和伤病历史。针对场上球员和守门员数据分别创建了5种ML算法:随机森林、K近邻、朴素贝叶斯、XGBoost和前3名集成算法。还对场上球员和守门员数据进行了LR分析。主要通过受试者操作特征曲线(AUC)下的面积来确定有效性。
球员数据来自2109名场上球员和213名守门员。对于预测场上球员下赛季伤病风险的模型,XGBoost表现最佳,AUC为0.948,而LR的AUC为0.937(P <.0001)。对于预测守门员下赛季伤病风险的模型,XGBoost的AUC最高,为0.956,而LR的AUC为0.947(P <.0001)。
诸如XGBoost等先进的ML模型优于LR,并显示出良好至出色的能力来预测下赛季是否可能发生可公开报告的伤病。