Phillips Vidith, Rao Nidhi L, Sanghvi Yashasvi H, Nizam Maryam
Internal Medicine, Division of Biomedical Informatics and Data Science, Johns Hopkins University, School of Medicine, Baltimore, USA.
Internal Medicine, KAP Viswanatham Government Medical College, Tiruchirappalli, IND.
Cureus. 2024 Nov 30;16(11):e74876. doi: 10.7759/cureus.74876. eCollection 2024 Nov.
Artificial intelligence (AI) plays a significant role in creating brochures on radiological procedures for patient education. Thus, this study aimed to evaluate the responses generated by ChatGPT (San Francisco, CA: OpenAI) and Google Gemini (Mountain View, CA: Google LLC) on abdominal ultrasound, abdominal CT scan, and abdominal MRI.
A cross-sectional original research was conducted over one week in June 2024 to evaluate the quality of patient information brochures produced by ChatGPT 3.5 and Google Gemini 1.5 Pro. The study assessed variables including word count, sentence count, average words per sentence, average syllables per sentence, grade level, and ease score using the Flesch-Kincaid calculator. Similarity percentage was evaluated using Quillbot (Chicago, IL: Quillbot Inc.), and reliability was measured using the modified DISCERN score. Statistical analysis was conducted using R version 4.3.2 (Vienna, Austria: R Foundation for Statistical Computing).
There is no significant difference between sentence count (p=0.8884), average words per sentence (p=0.1984), average syllables per sentence (p=0.3868), ease (p=0.1812), similarity percentage (p=0.8110), and reliability score (p=0.6495). However, there is a statistically significant difference, with ChatGPT having a higher word count (p=0.0409) and grade level (p=0.0482) than Google Gemini. P-values <0.05 were considered significant.
Both ChatGPT and Google Gemini demonstrate the ability to generate content that maintains consistency assessed through readability and reliability scores. Nevertheless, the noticeable disparities in word count and grade level underscore a crucial area for improvement in customizing content to accommodate varying levels of patient literacy.
人工智能在创建用于患者教育的放射学检查手册方面发挥着重要作用。因此,本研究旨在评估ChatGPT(加利福尼亚州旧金山:OpenAI)和谷歌Gemini(加利福尼亚州山景城:谷歌有限责任公司)对腹部超声、腹部CT扫描和腹部MRI所生成的回答。
2024年6月进行了为期一周的横断面原创性研究,以评估ChatGPT 3.5和谷歌Gemini 1.5 Pro生成的患者信息手册的质量。该研究使用弗莱什-金凯德计算器评估了包括单词数、句子数、平均每句单词数、平均每句音节数、年级水平和易读性得分等变量。使用Quillbot(伊利诺伊州芝加哥:Quillbot公司)评估相似度百分比,并使用修改后的DISCERN评分来衡量可靠性。使用R版本4.3.2(奥地利维也纳:R统计计算基金会)进行统计分析。
句子数(p = 0.8884)、平均每句单词数(p = 0.1984)、平均每句音节数(p = 0.3868)、易读性(p = 0.1812)、相似度百分比(p = 0.8110)和可靠性得分(p = 0.6495)之间没有显著差异。然而,存在统计学上的显著差异,ChatGPT的单词数(p = 0.0409)和年级水平(p = 0.0482)高于谷歌Gemini。p值<0.05被认为具有显著性。
ChatGPT和谷歌Gemini都展示了生成内容的能力,通过可读性和可靠性得分评估,这些内容保持了一致性。然而,单词数和年级水平上的显著差异凸显了在定制内容以适应不同患者识字水平方面需要改进的关键领域。