United States Olympic & Paralympic Committee, Colorado Springs, CO, United States.
Sport Injury Prevention Research Centre, Faculty of Kinesiology, University of Calgary, Calgary, Canada.
J Sports Sci Med. 2021 Mar 5;20(2):188-196. doi: 10.52082/jssm.2021.188. eCollection 2021 Jun.
Missing data can influence calculations of accumulated athlete workload. The objectives were to identify the best single imputation methods and examine workload trends using multiple imputation. External (jumps per hour) and internal (rating of perceived exertion; RPE) workload were recorded for 93 (45 females, 48 males) high school basketball players throughout a season. Recorded data were simulated as missing and imputed using ten imputation methods based on the context of the individual, team and session. Both single imputation and machine learning methods were used to impute the simulated missing data. The difference between the imputed data and the actual workload values was computed as root mean squared error (RMSE). A generalized estimating equation determined the effect of imputation method on RMSE. Multiple imputation of the original dataset, with all known and actual missing workload data, was used to examine trends in longitudinal workload data. Following multiple imputation, a Pearson correlation evaluated the longitudinal association between jump count and sRPE over the season. A single imputation method based on the specific context of the session for which data are missing (team mean) was only outperformed by methods that combine information about the session and the individual (machine learning models). There was a significant and strong association between jump count and sRPE in the original data and imputed datasets using multiple imputation. The amount and nature of the missing data should be considered when choosing a method for single imputation of workload data in youth basketball. Multiple imputation using several predictor variables in a regression model can be used for analyses where workload is accumulated across an entire season.
缺失数据可能会影响运动员累积工作量的计算。本研究的目的是确定最佳的单一插补方法,并使用多重插补检查工作量趋势。在一个赛季中,对 93 名(45 名女性,48 名男性)高中篮球运动员的外部(每小时跳跃次数)和内部(感知用力等级;RPE)工作量进行了记录。记录的数据被模拟为缺失数据,并使用基于个体、团队和会话背景的十种插补方法进行插补。使用单一插补和机器学习方法对模拟的缺失数据进行插补。插补数据与实际工作量值之间的差异计算为均方根误差(RMSE)。广义估计方程确定了插补方法对 RMSE 的影响。对原始数据集进行多重插补,包括所有已知和实际缺失的工作量数据,用于检查纵向工作量数据的趋势。在进行多重插补后,使用 Pearson 相关系数评估了整个赛季中跳跃次数和 sRPE 之间的纵向相关性。仅基于缺失数据(团队平均值)的特定会话上下文的单一插补方法仅优于结合会话和个体信息的方法(机器学习模型)。在原始数据和使用多重插补的插补数据中,跳跃次数和 sRPE 之间存在显著且强烈的相关性。在选择青少年篮球工作量数据的单一插补方法时,应考虑缺失数据的数量和性质。在回归模型中使用多个预测变量进行多重插补可用于整个赛季累积工作量的分析。