S Adithya, Aggarwal Shreyas, Sridhar Janani, Vs Kavya, John Victoria P, Singh Chaihthanya
Medical School, Ramaiah Medical College, Bangalore, IND.
Geriatrics, Prince Charles Hospital, Cwm Taf Morgannwg University Health Board, Merthyr Tydfil, GBR.
Cureus. 2024 Aug 31;16(8):e68307. doi: 10.7759/cureus.68307. eCollection 2024 Aug.
Introduction The study assesses the readability of AI-generated brochures for common emergency medical conditions like heart attack, anaphylaxis, and syncope. Thus, the study aims to compare the AI-generated responses for patient information guides of common emergency medical conditions using ChatGPT and Google Gemini. Methodology Brochures for each condition were created by both AI tools. Readability was assessed using the Flesch-Kincaid Calculator, evaluating word count, sentence count and ease of understanding. Reliability was measured using the Modified DISCERN Score. The similarity between AI outputs was determined using Quillbot. Statistical analysis was performed with R (v4.3.2). Results ChatGPT and Gemini produced brochures with no statistically significant differences in word count (p= 0.2119), sentence count (p=0.1276), readability (p=0.3796), or reliability (p=0.7407). However, ChatGPT provided more detailed content with 32.4% more words (582.80 vs. 440.20) and 51.6% more sentences (67.00 vs. 44.20). In addition, Gemini's brochures were slightly easier to read with a higher ease score (50.62 vs. 41.88). Reliability varied by topic with ChatGPT scoring higher for Heart Attack (4 vs. 3) and Choking (3 vs. 2), while Google Gemini scored higher for Anaphylaxis (4 vs. 3) and Drowning (4 vs. 3), highlighting the need for topic-specific evaluation. Conclusions Although AI-generated brochures from ChatGPT and Gemini are comparable in readability and reliability for patient information on emergency medical conditions, this study highlights that there is no statistically significant difference in the responses generated by the two AI tools.
引言 本研究评估了人工智能生成的针对心脏病发作、过敏反应和晕厥等常见紧急医疗状况的宣传册的可读性。因此,该研究旨在比较使用ChatGPT和谷歌Gemini生成的针对常见紧急医疗状况的患者信息指南的回复。方法 两种人工智能工具都针对每种状况创建了宣传册。使用弗莱什-金凯德计算器评估可读性,评估单词数、句子数和理解难易程度。使用改良的DISCERN评分来衡量可靠性。使用Quillbot确定人工智能输出之间的相似度。使用R(v4.3.2)进行统计分析。结果 ChatGPT和Gemini生成的宣传册在单词数(p = 0.2119)、句子数(p = 0.1276)、可读性(p = 0.3796)或可靠性(p = 0.7407)方面没有统计学上的显著差异。然而,ChatGPT提供了更详细的内容,单词数多32.4%(582.80对440.20),句子数多51.6%(67.00对44.20)。此外,Gemini生成的宣传册阅读起来稍容易一些,易读性得分更高(50.62对41.88)。可靠性因主题而异,ChatGPT在心脏病发作(4分对3分)和窒息(3分对2分)方面得分更高,而谷歌Gemini在过敏反应(4分对3分)和溺水(4分对3分)方面得分更高,这突出了进行特定主题评估的必要性。结论 尽管ChatGPT和Gemini生成的针对紧急医疗状况患者信息的宣传册在可读性和可靠性方面具有可比性,但本研究强调,这两种人工智能工具生成的回复在统计学上没有显著差异。