Karnuta Jaret M, Luu Bryan C, Haeberle Heather S, Saluan Paul M, Frangiamore Salvatore J, Stearns Kim L, Farrow Lutul D, Nwachukwu Benedict U, Verma Nikhil N, Makhni Eric C, Schickendantz Mark S, Ramkumar Prem N
Orthopaedic Machine Learning Laboratory, Cleveland Clinic, Cleveland, Ohio, USA.
Department of Orthopedic Surgery, Baylor College of Medicine, Houston, Texas, USA.
Orthop J Sports Med. 2020 Nov 11;8(11):2325967120963046. doi: 10.1177/2325967120963046. eCollection 2020 Nov.
Machine learning (ML) allows for the development of a predictive algorithm capable of imbibing historical data on a Major League Baseball (MLB) player to accurately project the player's future availability.
To determine the validity of an ML model in predicting the next-season injury risk and anatomic injury location for both position players and pitchers in the MLB.
Descriptive epidemiology study.
Using 4 online baseball databases, we compiled MLB player data, including age, performance metrics, and injury history. A total of 84 ML algorithms were developed. The output of each algorithm reported whether the player would sustain an injury the following season as well as the injury's anatomic site. The area under the receiver operating characteristic curve (AUC) primarily determined validation.
Player data were generated from 1931 position players and 1245 pitchers, with a mean follow-up of 4.40 years (13,982 player-years) between the years of 2000 and 2017. Injured players spent a total of 108,656 days on the disabled list, with a mean of 34.21 total days per player. The mean AUC for predicting next-season injuries was 0.76 among position players and 0.65 among pitchers using the top 3 ensemble classification. Back injuries had the highest AUC among both position players and pitchers, at 0.73. Advanced ML models outperformed logistic regression in 13 of 14 cases.
Advanced ML models generally outperformed logistic regression and demonstrated fair capability in predicting publicly reportable next-season injuries, including the anatomic region for position players, although not for pitchers.
机器学习(ML)有助于开发一种预测算法,该算法能够吸收美国职业棒球大联盟(MLB)球员的历史数据,以准确预测球员未来的上场情况。
确定ML模型在预测MLB中内野手和投手下赛季受伤风险及解剖学损伤部位方面的有效性。
描述性流行病学研究。
我们使用4个在线棒球数据库,汇编了MLB球员数据,包括年龄、表现指标和受伤史。共开发了84种ML算法。每种算法的输出报告了球员在下个赛季是否会受伤以及受伤的解剖部位。主要通过受试者操作特征曲线(AUC)下的面积来确定有效性。
球员数据来自1931名内野手和1245名投手,在2000年至2017年期间平均随访4.40年(13982球员年)。受伤球员在伤病名单上总共花费了108656天,平均每名球员总共34.21天。使用前3种集成分类方法,预测下赛季受伤情况的平均AUC在内野手中为0.76,在投手中为0.65。背部受伤在这两类球员中AUC最高,为0.73。在14个案例中的13个案例中,先进的ML模型优于逻辑回归。
先进的ML模型总体上优于逻辑回归,并且在预测公开报告的下赛季受伤情况方面表现出一定能力,包括内野手的解剖区域,但投手情况除外。