Doğan L, Özer Özcan Z, Edhem Yılmaz I
Department of Ophthalmology, Ömer Halisdemir University School of Medicine, 51100 Niğde, Turkey.
Department of Ophthalmology, Gaziantep City Hospital, Gaziantep, Turkey.
J Fr Ophtalmol. 2025 Feb;48(2):104381. doi: 10.1016/j.jfo.2024.104381. Epub 2024 Dec 13.
To evaluate the appropriateness, understandability, actionability, and readability of responses provided by ChatGPT-3.5, Bard, and Bing Chat to frequently asked questions about keratorefractive surgery (KRS).
Thirty-eight frequently asked questions about KRS were directed three times to a fresh ChatGPT-3.5, Bard, and Bing Chat interfaces. Two experienced refractive surgeons categorized the chatbots' responses according to their appropriateness and the accuracy of the responses was assessed using the Structure of the Observed Learning Outcome (SOLO) taxonomy. Flesch Reading Ease (FRE) and Coleman-Liau Index (CLI) were used to evaluate the readability of the responses of chatbots. Furthermore, the understandability scores of responses were evaluated using the Patient Education Materials Assessment Tool (PEMAT).
The appropriateness of the ChatGPT-3.5, Bard, and Bing Chat responses was 86.8% (33/38), 84.2% (32/38), and 81.5% (31/38), respectively (P>0.05). According to the SOLO test, ChatGPT-3.5 (3.91±0.44) achieved the highest mean accuracy and followed by Bard (3.64±0.61) and Bing Chat (3.19±0.55). For understandability (mean PEMAT-U score the ChatGPT-3.5: 68.5%, Bard: 78.6%, and Bing Chat: 67.1%, P<0.05), and actionability (mean PEMAT-A score the ChatGPT-3.5: 62.6%, Bard: 72.4%, and Bing Chat: 60.9%, P<0.05) the Bard scored better than the other chatbots. Two readability analyses showed that Bing had the highest readability, followed by the ChatGPT-3.5 and Bard, however, the understandability and readability scores were more challenging than the recommended level.
Artificial intelligence supported chatbots have the potential to provide detailed and appropriate responses at acceptable levels in KRS. Chatbots, while promising for patient education in KRS, require further progress, especially in readability and understandability aspects.
评估ChatGPT-3.5、Bard和必应聊天机器人(Bing Chat)对角膜屈光手术(KRS)常见问题的回答的恰当性、易懂性、可操作性和可读性。
向全新的ChatGPT-3.5、Bard和必应聊天机器人界面三次提出38个关于KRS的常见问题。两名经验丰富的屈光外科医生根据回答的恰当性对聊天机器人的回复进行分类,并使用观察学习成果结构(SOLO)分类法评估回答的准确性。使用弗莱什易读性指数(FRE)和科尔曼-廖氏指数(CLI)评估聊天机器人回复的可读性。此外,使用患者教育材料评估工具(PEMAT)评估回复的易懂性得分。
ChatGPT-3.5、Bard和必应聊天机器人回复的恰当性分别为86.8%(33/38)、84.2%(32/38)和81.5%(31/38)(P>0.05)。根据SOLO测试,ChatGPT-3.5(3.91±0.44)的平均准确率最高,其次是Bard(3.64±0.61)和必应聊天机器人(3.19±0.55)。在易懂性方面(平均PEMAT-U得分:ChatGPT-3.5为68.5%,Bard为78.6%,必应聊天机器人为67.1%,P<0.05)以及可操作性方面(平均PEMAT-A得分:ChatGPT-3.5为62.6%,Bard为72.4%,必应聊天机器人为60.9%,P<0.05),Bard的表现优于其他聊天机器人。两项可读性分析表明,必应的可读性最高,其次是ChatGPT-3.5和Bard,然而,易懂性和可读性得分比推荐水平更具挑战性。
人工智能支持的聊天机器人有潜力在角膜屈光手术中以可接受的水平提供详细且恰当的回答。聊天机器人虽然在角膜屈光手术患者教育方面前景广阔,但仍需进一步改进,尤其是在可读性和易懂性方面。