Department of Sports Science and Movement Pedagogy, Technische Universität Braunschweig, Braunschweig, Germany.
Integrative and Experimental Exercise Science, Department of Sport Science, University of Würzburg, Würzburg, Germany.
J Sports Sci Med. 2024 Mar 1;23(1):56-72. doi: 10.52082/jssm.2024.56. eCollection 2024 Mar.
ChatGPT may be used by runners to generate training plans to enhance performance or health aspects. However, the quality of ChatGPT generated training plans based on different input information is unknown. The objective of the study was to evaluate ChatGPT-generated six-week training plans for runners based on different input information granularity. Three training plans were generated by ChatGPT using different input information granularity. 22 quality criteria for training plans were drawn from the literature and used to evaluate training plans by coaching experts on a 1-5 Likert Scale. A Friedmann test assessed significant differences in quality between training plans. For training plans 1, 2 and 3, a median rating of <3 was given 19, 11, and 1 times, a median rating of 3 was given 3, 5, and 8 times and a median rating of >3 was given 0, 6, 13 times, respectively. Training plan 1 received significantly lower ratings compared to training plan 2 for 3 criteria, and 15 times significantly lower ratings compared to training plan 3 (p < 0.05). Training plan 2 received significantly lower ratings (p < 0.05) compared to plan 3 for 9 criteria. ChatGPT generated plans are ranked sub-optimally by coaching experts, although the quality increases when more input information are provided. An understanding of aspects relevant to programming distance running training is important, and we advise avoiding the use of ChatGPT generated training plans without an expert coach's feedback.
ChatGPT 可能被跑步者用于生成训练计划,以提高表现或健康方面。然而,基于不同输入信息的 ChatGPT 生成的训练计划的质量是未知的。本研究的目的是评估基于不同输入信息粒度的 ChatGPT 生成的六周跑步者训练计划。ChatGPT 使用不同的输入信息粒度生成了三个训练计划。从文献中提取了 22 个训练计划质量标准,并由教练专家使用 1-5 分的李克特量表对训练计划进行评估。弗里德曼检验评估了训练计划之间质量的显著差异。对于训练计划 1、2 和 3,分别有 19、11 和 1 次被评为<3,3、5 和 8 次被评为 3,0、6 和 13 次被评为>3。与训练计划 2 相比,训练计划 1 有 3 项标准的评分明显较低,与训练计划 3 相比有 15 项标准的评分明显较低(p < 0.05)。与训练计划 3 相比,训练计划 2 有 9 项标准的评分明显较低(p < 0.05)。虽然提供更多的输入信息会提高质量,但教练专家对 ChatGPT 生成的计划评价不高。了解与编程长跑训练相关的方面很重要,我们建议在没有专家教练反馈的情况下避免使用 ChatGPT 生成的训练计划。