Ophthalmology Department, Gaziantep Islam Science and Technology University, Gaziantep, Turkey.
Ophthalmology Department, Gaziantep Islam Science and Technology University, Gaziantep, Turkey.
Int J Med Inform. 2024 Nov;191:105592. doi: 10.1016/j.ijmedinf.2024.105592. Epub 2024 Aug 16.
Strabismus is a common eye condition affecting both children and adults. Effective patient education is crucial for informed decision-making, but traditional methods often lack accessibility and engagement. Chatbots powered by AI have emerged as a promising solution.
This study aims to evaluate and compare the performance of three chatbots (ChatGPT, Bard, and Copilot) and a reliable website (AAPOS) in answering real patient questions about strabismus.
Three chatbots (ChatGPT, Bard, and Copilot) were compared to a reliable website (AAPOS) using real patient questions. Metrics included accuracy (SOLO taxonomy), understandability/actionability (PEMAT), and readability (Flesch-Kincaid). We also performed a sentiment analysis to capture the emotional tone and impact of the responses.
The AAPOS achieved the highest mean SOLO score (4.14 ± 0.47), followed by Bard, Copilot, and ChatGPT. Bard scored highest on both PEMAT-U (74.8 ± 13.3) and PEMAT-A (66.2 ± 13.6) measures. Flesch-Kincaid Ease Scores revealed the AAPOS as the easiest to read (mean score: 55.8 ± 14.11), closely followed by Copilot. ChatGPT, and Bard had lower scores on readability. The sentiment analysis revealed exciting differences.
Chatbots, particularly Bard and Copilot, show promise in patient education for strabismus with strengths in understandability and actionability. However, the AAPOS website outperformed in accuracy and readability.
斜视是一种常见的眼部疾病,影响儿童和成人。有效的患者教育对于知情决策至关重要,但传统方法往往缺乏可及性和参与度。人工智能驱动的聊天机器人已经成为一种有前途的解决方案。
本研究旨在评估和比较三种聊天机器人(ChatGPT、Bard 和 Copilot)和一个可靠网站(AAPOS)在回答斜视相关真实患者问题方面的表现。
使用真实患者问题比较三种聊天机器人(ChatGPT、Bard 和 Copilot)和一个可靠网站(AAPOS)。评估指标包括准确性(SOLO 分类法)、可理解性/可操作性(PEMAT)和可读性(Flesch-Kincaid)。我们还进行了情感分析,以捕捉回复的情感基调。
AAPOS 的平均 SOLO 得分最高(4.14±0.47),其次是 Bard、Copilot 和 ChatGPT。Bard 在 PEMAT-U(74.8±13.3)和 PEMAT-A(66.2±13.6)方面得分最高。Flesch-Kincaid 易读性得分显示 AAPOS 最易读(平均得分:55.8±14.11),紧随其后的是 Copilot。ChatGPT 和 Bard 的可读性得分较低。情感分析显示出令人兴奋的差异。
聊天机器人,特别是 Bard 和 Copilot,在斜视患者教育方面具有潜力,在可理解性和可操作性方面具有优势。然而,AAPOS 网站在准确性和可读性方面表现更为出色。