由GPT-4和谷歌Gemini生成的与肥大相关的训练计划的可重复性和质量，由教练专家评估。

Reproducibility and quality of hypertrophy-related training plans generated by GPT-4 and Google Gemini as evaluated by coaching experts.

作者信息

Havers Tim, Masur Lukas, Isenmann Eduard, Geisler Stephan, Zinner Christoph, Sperlich Billy, Düking Peter

机构信息

Department of Fitness and Health, IST University of Applied Sciences, Düsseldorf, Germany.

Faculty of Sport and Health Sciences, Technical University of Munich, Munich, Germany.

出版信息

Biol Sport. 2025 Apr;42(2):289-329. doi: 10.5114/biolsport.2025.145911. Epub 2024 Dec 18.

DOI:10.5114/biolsport.2025.145911

PMID:40182716

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11963122/

Abstract

Large Language Models (LLMs) are increasingly utilized in various domains, including the generation of training plans. However, the reproducibility and quality of training plans produced by different LLMs have not been studied extensively. This study aims to: i) investigate and compare the quality of muscle hypertrophy-related resistance training (RT) plans generated by Google Gemini (GG) and GPT-4, and ii) the reproducibility of the RT plans when the same prompts are provided multiple times concomitantly. Two distinct prompts were used, one providing little information about the training plan requirements and the other providing detailed information. These prompts were input into GG and GPT-4 by two different individuals, resulting in the generation of eight RT plans. These plans were evaluated by 12 coaching experts using a 5-point Likert scale, based on quality criteria derived from the literature. The results indicated a high degree of reproducibility, as indicated by coaching expert evaluation, when the same distinct prompts were provided multiple times to the LLMs of interest, with 27 out of 28 items showing no differences (p > 0.05). Overall, GPT-4 was rated higher on several aspects of RT quality criteria (p = 0.000-0.043). Additionally, compared to little information, higher information density within the prompts resulted in higher rated RT quality (p = 0.000-0.037). Our findings show that RT plans can be generated reproducibly with the same quality when using the same prompts. Furthermore, quality improves with more detailed input, and GPT-4 outperformed GG in generating higherquality plans. These results suggest that detailed information input is crucial for LLM performance.

摘要

大语言模型（LLMs）越来越多地应用于各个领域，包括训练计划的生成。然而，不同大语言模型生成的训练计划的可重复性和质量尚未得到广泛研究。本研究旨在：i）调查并比较由谷歌Gemini（GG）和GPT-4生成的与肌肉肥大相关的阻力训练（RT）计划的质量，以及ii）当多次同时提供相同提示时RT计划的可重复性。使用了两个不同的提示，一个提供关于训练计划要求的信息很少，另一个提供详细信息。这两个提示由两个不同的人输入到GG和GPT-4中，从而生成了八个RT计划。这些计划由12名教练专家根据从文献中得出的质量标准，使用5点李克特量表进行评估。结果表明，当向相关大语言模型多次提供相同的不同提示时，教练专家评估显示出高度的可重复性，28个项目中有27个没有差异（p>0.05）。总体而言，GPT-4在RT质量标准的几个方面得分更高（p=0.000 - 0.043）。此外，与信息少相比，提示中的信息密度更高会导致RT质量评分更高（p=0.000 - 0.037）。我们的研究结果表明，使用相同提示可以可重复地生成具有相同质量的RT计划。此外，随着输入更详细，质量会提高，并且GPT-4在生成更高质量计划方面优于GG。这些结果表明，详细的信息输入对大语言模型的性能至关重要。

相似文献

Reproducibility and quality of hypertrophy-related training plans generated by GPT-4 and Google Gemini as evaluated by coaching experts.

Biol Sport. 2025 Apr;42(2):289-329. doi: 10.5114/biolsport.2025.145911. Epub 2024 Dec 18.

ChatGPT Generated Training Plans for Runners are not Rated Optimal by Coaching Experts, but Increase in Quality with Additional Input Information.

J Sports Sci Med. 2024 Mar 1;23(1):56-72. doi: 10.52082/jssm.2024.56. eCollection 2024 Mar.

AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.

J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.

An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.

JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.

Dr. Chatbot: Investigating the Quality and Quantity of Responses Generated by Three AI Chatbots to Prompts Regarding Carpal Tunnel Syndrome.

Cureus. 2025 Mar 24;17(3):e81068. doi: 10.7759/cureus.81068. eCollection 2025 Mar.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis.

JMIR Cancer. 2025 Apr 7;11:e67914. doi: 10.2196/67914.

Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.

Neurosurg Rev. 2025 Mar 25;48(1):320. doi: 10.1007/s10143-025-03472-7.

Vignette-based comparative analysis of ChatGPT and specialist treatment decisions for rheumatic patients: results of the Rheum2Guide study.

Rheumatol Int. 2024 Oct;44(10):2043-2053. doi: 10.1007/s00296-024-05675-5. Epub 2024 Aug 10.

Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.

J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.

引用本文的文献

Assessment of Recommendations Provided to Athletes Regarding Sleep Education by GPT-4o and Google Gemini: Comparative Evaluation Study.

JMIR Form Res. 2025 Jul 8;9:e71358. doi: 10.2196/71358.

本文引用的文献

Artificial intelligence in sport: Exploring the potential of using ChatGPT in resistance training prescription.

Biol Sport. 2024 Mar;41(2):209-220. doi: 10.5114/biolsport.2024.132987. Epub 2023 Nov 20.

Using artificial intelligence for exercise prescription in personalised health promotion: A critical evaluation of OpenAI's GPT-4 model.

Biol Sport. 2024 Mar;41(2):221-241. doi: 10.5114/biolsport.2024.133661. Epub 2023 Dec 13.

ChatGPT Generated Training Plans for Runners are not Rated Optimal by Coaching Experts, but Increase in Quality with Additional Input Information.

J Sports Sci Med. 2024 Mar 1;23(1):56-72. doi: 10.52082/jssm.2024.56. eCollection 2024 Mar.

Optimizing Resistance Training Technique to Maximize Muscle Hypertrophy: A Narrative Review.

J Funct Morphol Kinesiol. 2023 Dec 29;9(1):9. doi: 10.3390/jfmk9010009.

ChatGPT is not ready yet for use in providing mental health assessment and interventions.

Front Psychiatry. 2024 Jan 4;14:1277756. doi: 10.3389/fpsyt.2023.1277756. eCollection 2023.

Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an AI-Based Chatbot: Mixed Methods Study.

JMIR Med Educ. 2024 Jan 11;10:e51308. doi: 10.2196/51308.

Effects of Different Weekly Set Progressions on Muscular Adaptations in Trained Males: Is There a Dose-Response Effect?

Med Sci Sports Exerc. 2024 Mar 1;56(3):553-563. doi: 10.1249/MSS.0000000000003317. Epub 2023 Oct 5.

Physiology of Stretch-Mediated Hypertrophy and Strength Increases: A Narrative Review.

Sports Med. 2023 Nov;53(11):2055-2075. doi: 10.1007/s40279-023-01898-x. Epub 2023 Aug 9.

ChatGPT for Sample-Size Calculation in Sports Medicine and Exercise Sciences: A Cautionary Note.

Int J Sports Physiol Perform. 2023 Aug 3;18(10):1219-1223. doi: 10.1123/ijspp.2023-0109. Print 2023 Oct 1.

Large language models in medicine.

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

由GPT-4和谷歌Gemini生成的与肥大相关的训练计划的可重复性和质量，由教练专家评估。

Reproducibility and quality of hypertrophy-related training plans generated by GPT-4 and Google Gemini as evaluated by coaching experts.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献