Liu Cheng, Xu Bingxiang, Wan Kang, Sun Qin, Wang Ruwen, Feng Yue, Shao Hui, Liu Tiemin, Wang Ru
School of Kinesiology, Shanghai University of Sport, Qingyuanhuan Road, #650, Yangpu District, Shanghai, 200438 China.
Faculty of Physical Culture and Sports, Ryazan State University, Ryazan, 390000 Russia.
Phenomics. 2024 Nov 20;4(5):465-472. doi: 10.1007/s43657-024-00176-8. eCollection 2024 Oct.
The field of competitive swimming lacks broadly applicable predictive models for talent identification across various age groups of adolescent swimmers. This study aimed to construct a predictive model for athletic talent using machine learning methods based on anthropometric and physiological data. Baseline data were collected from 5444 participants aged 10-18 in Shanghai, China, between 2015 and 2018, with 4969 completing a 3-year follow-up. Talents were discerned based on their performance over the follow-up period, revealing age- and sex- dependent developmental differences between swimmers classified as talented versus non-talented. After controlling for confounding variables, age and sex, nine machine learning algorithms were employed, with Random Forest achieving the highest performance and being selected as the final model. The model demonstrated excellent predictive performance on both the test dataset and an independent validation dataset from Shandong ( = 118), indicating its strong generalizability. Furthermore, using the SHapley Additive exPlanations (SHAP) method to interpret the model, abdominal skinfold, lung capacity, chest circumference, shoulder width, and triceps skinfold were identified as the five most critical indicators for talent identification.
The online version contains supplementary material available at 10.1007/s43657-024-00176-8.
竞技游泳领域缺乏适用于各年龄段青少年游泳运动员选材的广泛适用的预测模型。本研究旨在基于人体测量和生理数据,使用机器学习方法构建运动天赋预测模型。2015年至2018年期间,从中国上海的5444名10 - 18岁参与者中收集基线数据,其中4969人完成了3年随访。根据随访期间的表现来识别天赋,结果显示,在被归类为有天赋和无天赋的游泳运动员之间,存在与年龄和性别相关的发育差异。在控制了年龄和性别等混杂变量后,使用了九种机器学习算法,随机森林算法表现最佳,被选为最终模型。该模型在测试数据集和来自山东的独立验证数据集(n = 118)上均表现出优异的预测性能,表明其具有很强的泛化能力。此外,使用SHapley加性解释(SHAP)方法对模型进行解释,确定腹部皮褶厚度、肺活量、胸围、肩宽和肱三头肌皮褶厚度是识别天赋的五个最关键指标。
在线版本包含可在10.1007/s43657-024-00176-8获取的补充材料。