Arslan Selva, Usta Küçükbezirci Güldeniz
Department of Ophthalmology, University of Health Sciences, Sadi Konuk Training and Research Hospital, Istanbul, TUR.
Department of Ophthalmology, University of Health Sciences, Istanbul Training and Research Hospital, Istanbul, TUR.
Cureus. 2025 Jul 29;17(7):e88980. doi: 10.7759/cureus.88980. eCollection 2025 Jul.
Purpose This study evaluates the performance of ChatGPT and Google Gemini in addressing refractive surgery-related patient questions by analysing the accuracy, completeness, and readability of their responses. Methods A total of 40 refractive surgery-related questions were compiled and categorized into three levels of difficulty: easy, medium, and hard. Responses from ChatGPT and Google Gemini were blinded and evaluated by two experienced ophthalmologists using standardized criteria. Accuracy was scored on a six-point Likert scale, completeness on a three-point scale, and readability using Flesch-Kincaid Grade Level, Gunning Fog Index, Simple Measure of Gobbledygook (SMOG) Index, and word count. Intra- and inter-rater reliability were assessed using intra-class correlation coefficients (ICC). Results Both chatbots demonstrated high intra-rater (ICC>0.75) and inter-rater reliability. Accuracy scores were similar for most questions; however, statistically significant differences were observed for harder questions, where Gemini showed slightly reduced performance compared to ChatGPT. Readability metrics revealed no significant differences between the two tools, although ChatGPT responses tended to be more detailed, while Gemini generated more concise answers. Harder questions resulted in longer and more complex responses, as indicated by higher Gunning Fog and SMOG Index scores. Conclusions ChatGPT and Google Gemini exhibit strong potential in patient education, with complementary strengths in accuracy, readability, and response detail. The influence of question complexity on chatbot performance highlights the need for ongoing optimization to enhance both clarity and accessibility. These findings underscore the value of integrating artificial intelligence (AI) tools into healthcare to support patient education and engagement.
目的 本研究通过分析ChatGPT和谷歌Gemini对屈光手术相关患者问题回答的准确性、完整性和可读性,评估它们在解决此类问题方面的表现。方法 总共编写了40个与屈光手术相关的问题,并将其分为三个难度级别:简单、中等和困难。ChatGPT和谷歌Gemini的回答进行了盲法处理,并由两名经验丰富的眼科医生使用标准化标准进行评估。准确性采用六点李克特量表评分,完整性采用三点量表评分,可读性采用弗莱什-金凯德年级水平、冈宁雾度指数、简单费解度测量(SMOG)指数和单词计数进行评估。使用组内相关系数(ICC)评估评分者内和评分者间的可靠性。结果 两个聊天机器人都表现出较高的评分者内(ICC>0.75)和评分者间可靠性。大多数问题的准确性得分相似;然而,对于较难的问题,观察到统计学上的显著差异,其中Gemini与ChatGPT相比表现略有下降。可读性指标显示两种工具之间没有显著差异,尽管ChatGPT的回答往往更详细,而Gemini生成的答案更简洁。较难的问题导致回答更长、更复杂,如较高的冈宁雾度和SMOG指数得分所示。结论 ChatGPT和谷歌Gemini在患者教育方面具有强大潜力,在准确性、可读性和回答细节方面具有互补优势。问题复杂性对聊天机器人性能的影响凸显了持续优化以提高清晰度和可及性的必要性。这些发现强调了将人工智能(AI)工具整合到医疗保健中以支持患者教育和参与的价值。