Gürbostan Soysal Gizem, Mercanlı Murat, Özcan Zeynep Özer, Yılmaz İbrahim Edhem, Berhuni Mustafa
Gaziantep City Hospital, University of Health Sciences, Gaziantep, Turkey.
Ophthalmology Clinic, Dünyagöz Hospital, Adana, Turkey.
Clin Exp Optom. 2025 Jun 26:1-5. doi: 10.1080/08164622.2025.2517750.
Artificial intelligence chatbots demonstrate potential as valuable educational resources for patients with dry eye disease, offering complementary information to established medical platforms.
The increasing prevalence of dry eye disease necessitates reliable and comprehensible patient information resources. This study evaluates and compares the quality of information provided by contemporary AI chatbots with established ophthalmological sources.
Three leading AI chatbots (ChatGPT-3.5, Gemini, and Llama) and the American Academy of Ophthalmology (AAO) website were systematically evaluated using 20 common patient questions about dry eye disease. Responses were assessed for accuracy using the Structure of Observed Learning Outcome (SOLO) taxonomy, understandability and actionability using the Patient Education Materials Assessment Tool (PEMAT), and linguistic accessibility using Flesch-Kincaid readability metrics.
Gemini demonstrated superior understandability with a mean PEMAT-U score of 73.4 ± 11.4, significantly higher than ChatGPT (65.4 ± 10.6), Llama (63.4 ± 10.3), and AAO (52.5 ± 19.3) ( < 0.001). No significant differences were observed in actionability scores ( = 0.120). The AAO website exhibited the highest reading ease score (50.4 ± 17.9, = 0.015). For accuracy assessment, ChatGPT achieved the highest mean SOLO score (3.4 ± 0.7), followed closely by Gemini (3.3 ± 0.8), with no significant performance differences detected among chatbots ( = 0.574). No instances of incorrect or potentially harmful information were identified across any evaluated source.
While AI chatbots demonstrate promising capabilities for patient education in dry eye disease, particularly in providing comprehensive and understandable information, their higher linguistic complexity presents a potential accessibility barrier. Future development should focus on enhancing readability while maintaining comprehensive content, positioning chatbots as valuable complements to - rather than replacements for - professional medical consultation.
人工智能聊天机器人显示出作为干眼症患者有价值的教育资源的潜力,可为现有医疗平台提供补充信息。
干眼症患病率不断上升,需要可靠且易于理解的患者信息资源。本研究评估并比较当代人工智能聊天机器人与现有眼科信息源所提供信息的质量。
使用20个关于干眼症的常见患者问题,对三个领先的人工智能聊天机器人(ChatGPT-3.5、Gemini和Llama)以及美国眼科学会(AAO)网站进行系统评估。使用观察学习成果结构(SOLO)分类法评估回答的准确性,使用患者教育材料评估工具(PEMAT)评估可理解性和可操作性,使用弗莱什-金凯德可读性指标评估语言可及性。
Gemini表现出更高的可理解性,平均PEMAT-U得分为73.4±11.4,显著高于ChatGPT(65.4±10.6)、Llama(63.4±10.3)和AAO(52.5±19.3)(<0.001)。在可操作性得分方面未观察到显著差异(=0.120)。AAO网站的阅读简易性得分最高(50.4±17.9,=0.015)。对于准确性评估,ChatGPT的平均SOLO得分最高(3.4±0.7),其次是Gemini(3.3±0.8),各聊天机器人之间未检测到显著性能差异(=0.574)。在任何评估来源中均未发现不正确或潜在有害信息的实例。
虽然人工智能聊天机器人在干眼症患者教育方面显示出有前景的能力,特别是在提供全面且易于理解的信息方面,但其较高的语言复杂性存在潜在的可及性障碍。未来的发展应侧重于提高可读性同时保持内容全面性,将聊天机器人定位为专业医疗咨询的有价值补充而非替代品。