Chaulagain Aayush, Aujla Savvy, Priyadarsini Archana, Godavarthi Aarav K, Yaqoob Usman
Internal Medicine, Shaheed Ziaur Rahman Medical College and Hospital, Bogra, BGD.
Orthopedics, Government Medical College, Amritsar, Amritsar, IND.
Cureus. 2025 May 20;17(5):e84507. doi: 10.7759/cureus.84507. eCollection 2025 May.
Introduction This study aims to compare the characteristics of educational brochures produced by two large language models for common neurological diseases such as migraine (MIG), Parkinson's disease, and Alzheimer's disease (AD). Despite the enthusiasm surrounding these technologies, there remains a critical need to systematically investigate their effectiveness, usability, and impact within healthcare contexts. This cross-sectional study investigates patient education brochures for AD, Parkinsonism, and MIG, emphasizing the emerging role of AI-driven tools, such as ChatGPT and Google Gemini. Methods Utilizing a patient information brochure approach, we compared responses generated by ChatGPT and Google Gemini, which, at the time of the study, were the two well-known and well-developed AI tools, by using the prompt "This cross-sectional study investigates patient education brochures for Alzheimer's disease, Parkinsonism, and migraine, emphasizing the emerging role of AI-driven tools, such as ChatGPT and Google Gemini." Readability and reliability were assessed using the Flesch-Kincaid calculator and Modified DISCERN Score, respectively. Statistical analysis was conducted using R software version 4.3.2. Results The results show no significant differences in mean word and sentence counts between the models, although Google Gemini produced shorter texts with fewer sentences (p = 0.04). Both models had similar average words per sentence (p = 0.97) and syllables per word (p = 0.28), but Google Gemini's texts were slightly more complex (ease score p = 0.29). Google Gemini's outputs were also more original, with lower similarity scores (p = 0.04). Pearson correlation coefficients indicated a moderate negative, though statistically insignificant, relationship between ease and reliability scores for both models. Conclusions While Google Gemini produced shorter and potentially more original content, no significant superiority of one AI tool over the other was observed, suggesting the need for ongoing refinement to optimize patient education materials for neurological conditions.
引言 本研究旨在比较两种大型语言模型针对偏头痛(MIG)、帕金森病和阿尔茨海默病(AD)等常见神经疾病生成的教育手册的特点。尽管人们对这些技术充满热情,但在医疗环境中,仍迫切需要系统地研究它们的有效性、可用性和影响。这项横断面研究调查了针对AD、帕金森症和MIG的患者教育手册,强调了ChatGPT和谷歌Gemini等人工智能驱动工具的新兴作用。
方法 我们采用患者信息手册的方法,通过使用提示语“这项横断面研究调查了针对阿尔茨海默病、帕金森症和偏头痛的患者教育手册,强调了ChatGPT和谷歌Gemini等人工智能驱动工具的新兴作用”,比较了ChatGPT和谷歌Gemini生成的回答。在研究进行时,这两个工具是两个知名且成熟的人工智能工具。分别使用弗莱什-金凯德计算器和改良的辨别分数评估可读性和可靠性。使用R软件版本4.3.2进行统计分析。
结果 结果显示,尽管谷歌Gemini生成的文本句子较少、篇幅较短(p = 0.04),但各模型之间的平均单词数和句子数没有显著差异。两个模型的平均每句单词数(p = 0.97)和平均每词音节数(p = 0.28)相似,但谷歌Gemini生成的文本稍微复杂一些(易读性得分p = 0.29)。谷歌Gemini的输出也更具原创性,相似度得分较低(p = 0.04)。皮尔逊相关系数表明,两个模型的易读性得分和可靠性得分之间存在中等程度的负相关关系,不过在统计学上不显著。
结论 虽然谷歌Gemini生成的内容篇幅较短且可能更具原创性,但未观察到一种人工智能工具明显优于另一种,这表明需要持续改进,以优化针对神经疾病的患者教育材料。