Rocha-Silva Rizia, de Lima Bráulio Evangelista, Costa Thalles Guilarducci, Morais Naiane Silva, José Geovana, Cordeiro Douglas Farias, de Almeida Alexandre Aparecido, Lopim Glauber Menezes, Viana Ricardo Borges, Sousa Bolivar Saldanha, Colugnati Diego Basile, Vancini Rodrigo Luiz, Andrade Marília Santos, Weiss Katja, Knechtle Beat, Arida Ricardo Mario, de Lira Claudio Andre Barbosa
Center for Teaching and Research Applied to Education, Federal University of Goiás, Goiânia, Brazil; Faculty of Medicine, Postgraduate Program in Health Sciences, Federal University of Goiás, Goiânia, Brazil.
Faculty of Medicine, Postgraduate Program in Health Sciences, Federal University of Goiás, Goiânia, Brazil.
Epilepsy Behav. 2025 Feb;163:110193. doi: 10.1016/j.yebeh.2024.110193. Epub 2024 Dec 4.
This study aims to evaluate the similarity, readability, and alignment with current scientific knowledge of responses from AI-based chatbots to common questions about epilepsy and physical exercise.
Four AI chatbots (ChatGPT-3.5,ChatGPT 4, Google Gemini, and Microsoft Copilot) were evaluated. Fourteen questions on epilepsy and physical exercise were designed to compare the platforms. Lexical similarity, response patterns, and thematic content were analyzed. Readability was measured using the Flesch Reading Ease and Flesch-Kincaid Grade Level scores. Seven experts rated the quality of responses on a Likert scale from "very poor" to "very good."
The responses showed lexical similarity, with approaches to physical exercise ranging from conservative to holistic. Microsoft Copilot scored the highest on the Flesch Reading Ease scale (48.42 ± 13.71), while ChatGPT-3.5 scored the lowest (23.84 ± 8.19). All responses were generally rated as difficult to read. Quality ratings ranged from "Good" to "Acceptable," with ChatGPT 4 being the preferred platform, chosen by 48.98 % of reviewers.
The findings highlight the potential of AI chatbots as useful sources of information on epilepsy and physical exercise. However, simplifying language and tailoring content to user's needs is essential to enhance their effectiveness.
本研究旨在评估基于人工智能的聊天机器人对有关癫痫与体育锻炼的常见问题的回答与当前科学知识的相似性、可读性及一致性。
对四个人工智能聊天机器人(ChatGPT-3.5、ChatGPT 4、谷歌Gemini和微软Copilot)进行评估。设计了14个关于癫痫与体育锻炼的问题以比较各平台。分析了词汇相似性、回答模式和主题内容。使用弗莱什易读性分数和弗莱什-金凯德年级水平分数来衡量可读性。七名专家按照从“非常差”到“非常好”的李克特量表对回答质量进行评分。
回答显示出词汇相似性,对体育锻炼的建议从保守到全面。微软Copilot在弗莱什易读性量表上得分最高(48.42±13.71),而ChatGPT-3.5得分最低(23.84±8.19)。所有回答总体上都被评为难以阅读。质量评分从“良好”到“可接受”,ChatGPT 4是首选平台,48.98%的评审者选择了它。
研究结果凸显了人工智能聊天机器人作为癫痫与体育锻炼信息有用来源的潜力。然而,简化语言并根据用户需求定制内容对于提高其有效性至关重要。