Clinical Evidence Development, Aetna Medical Affairs, CVS Health Corporation, Hartford, CT, United States.
Department of Preventive Cardiology, Hartford Hospital, Hartford, CT, United States.
JMIR Med Educ. 2024 Jan 11;10:e51308. doi: 10.2196/51308.
Regular physical activity is critical for health and disease prevention. Yet, health care providers and patients face barriers to implement evidence-based lifestyle recommendations. The potential to augment care with the increased availability of artificial intelligence (AI) technologies is limitless; however, the suitability of AI-generated exercise recommendations has yet to be explored.
The purpose of this study was to assess the comprehensiveness, accuracy, and readability of individualized exercise recommendations generated by a novel AI chatbot.
A coding scheme was developed to score AI-generated exercise recommendations across ten categories informed by gold-standard exercise recommendations, including (1) health condition-specific benefits of exercise, (2) exercise preparticipation health screening, (3) frequency, (4) intensity, (5) time, (6) type, (7) volume, (8) progression, (9) special considerations, and (10) references to the primary literature. The AI chatbot was prompted to provide individualized exercise recommendations for 26 clinical populations using an open-source application programming interface. Two independent reviewers coded AI-generated content for each category and calculated comprehensiveness (%) and factual accuracy (%) on a scale of 0%-100%. Readability was assessed using the Flesch-Kincaid formula. Qualitative analysis identified and categorized themes from AI-generated output.
AI-generated exercise recommendations were 41.2% (107/260) comprehensive and 90.7% (146/161) accurate, with the majority (8/15, 53%) of inaccuracy related to the need for exercise preparticipation medical clearance. Average readability level of AI-generated exercise recommendations was at the college level (mean 13.7, SD 1.7), with an average Flesch reading ease score of 31.1 (SD 7.7). Several recurring themes and observations of AI-generated output included concern for liability and safety, preference for aerobic exercise, and potential bias and direct discrimination against certain age-based populations and individuals with disabilities.
There were notable gaps in the comprehensiveness, accuracy, and readability of AI-generated exercise recommendations. Exercise and health care professionals should be aware of these limitations when using and endorsing AI-based technologies as a tool to support lifestyle change involving exercise.
有规律的身体活动对健康和疾病预防至关重要。然而,医疗保健提供者和患者在实施基于证据的生活方式建议方面面临障碍。随着人工智能 (AI) 技术的日益普及,其为医疗保健带来的辅助潜力是无限的;然而,AI 生成的运动建议的适宜性尚未得到探索。
本研究旨在评估一种新型 AI 聊天机器人生成的个性化运动建议的全面性、准确性和可读性。
制定了一个编码方案,根据黄金标准运动建议,对 AI 生成的运动建议进行评分,共涵盖十个类别,包括(1)运动对特定健康状况的益处,(2)运动前健康筛查,(3)频率,(4)强度,(5)时间,(6)类型,(7)量,(8)进展,(9)特殊考虑,以及(10)参考主要文献。使用开源应用程序编程接口,提示 AI 聊天机器人为 26 种临床人群提供个性化运动建议。两名独立审查员对每个类别的 AI 生成内容进行编码,并根据 0%-100%的比例计算全面性(%)和事实准确性(%)。使用 Flesch-Kincaid 公式评估可读性。定性分析从 AI 生成的输出中识别和分类主题。
AI 生成的运动建议有 41.2%(107/260)是全面的,90.7%(146/161)是准确的,其中大部分(8/15,53%)不准确是因为需要运动前医疗许可。AI 生成的运动建议的平均阅读水平为大学水平(平均 13.7,SD 1.7),平均 Flesch 阅读舒适度得分为 31.1(SD 7.7)。AI 生成输出的几个反复出现的主题和观察结果包括对责任和安全的关注、对有氧运动的偏好,以及对某些年龄群体和残疾人士的潜在偏见和直接歧视。
AI 生成的运动建议在全面性、准确性和可读性方面存在明显差距。在使用和认可基于 AI 的技术作为支持涉及运动的生活方式改变的工具时,运动和医疗保健专业人员应该意识到这些局限性。