人工智能聊天机器人在眼科疾病分诊中的表现。
Artificial intelligence chatbot performance in triage of ophthalmic conditions.
作者信息
Lyons Riley J, Arepalli Sruthi R, Fromal Ollya, Choi Jinho D, Jain Nieraj
机构信息
Department of Ophthalmology, Emory University School of Medicine, Atlanta, GA.
Department of Computer Science, Emory University, Atlanta, GA.
出版信息
Can J Ophthalmol. 2024 Aug;59(4):e301-e308. doi: 10.1016/j.jcjo.2023.07.016. Epub 2023 Aug 9.
BACKGROUND
Timely access to human expertise for affordable and efficient triage of ophthalmic conditions is inconsistent. With recent advancements in publicly available artificial intelligence (AI) chatbots, the lay public may turn to these tools for triage of ophthalmic complaints. Validation studies are necessary to evaluate the performance of AI chatbots as triage tools and inform the public regarding their safety.
OBJECTIVE
To evaluate the triage performance of AI chatbots for ophthalmic conditions.
DESIGN
Cross-sectional study.
SETTING
Single centre.
PARTICIPANTS
Ophthalmology trainees, OpenAI ChatGPT (GPT-4), Bing Chat, and WebMD Symptom Checker.
METHODS
Forty-four clinical vignettes representing common ophthalmic complaints were developed, and a standardized pathway of prompts was presented to each tool in March 2023. Primary outcomes were proportion of responses with the correct diagnosis listed in the top 3 possible diagnoses and proportion with correct triage urgency. Ancillary outcomes included presence of grossly inaccurate statements, mean reading grade level, mean response word count, proportion with attribution, and most common sources cited.
RESULTS
The ophthalmologists in training, ChatGPT, Bing Chat, and the WebMD Symptom Checker listed the appropriate diagnosis among the top 3 suggestions in 42 (95%), 41 (93%), 34 (77%), and 8 (33%) cases, respectively. Triage urgency was appropriate in 38 (86%), 43 (98%), and 37 (84%) cases for ophthalmology trainees, ChatGPT, and Bing Chat, correspondingly.
CONCLUSIONS
ChatGPT using the GPT-4 model offered high diagnostic and triage accuracy that was comparable with that of ophthalmology trainees with no grossly inaccurate statements. Bing Chat had lower accuracy and a tendency to overestimate triage urgency.
背景
及时获得专业人员进行经济高效的眼科疾病分诊的情况并不一致。随着公开可用的人工智能(AI)聊天机器人的最新进展,普通公众可能会转向这些工具来对眼科疾病进行分诊。验证研究对于评估AI聊天机器人作为分诊工具的性能并告知公众其安全性是必要的。
目的
评估AI聊天机器人对眼科疾病的分诊性能。
设计
横断面研究。
设置
单中心。
参与者
眼科实习生、OpenAI ChatGPT(GPT-4)、必应聊天和WebMD症状检查器。
方法
制定了44个代表常见眼科疾病的临床病例,并于2023年3月向每个工具呈现了标准化的提示路径。主要结果是在前3种可能的诊断中列出正确诊断的回答比例以及分诊紧急程度正确的比例。辅助结果包括是否存在严重不准确的陈述、平均阅读年级水平、平均回答字数、有归因的比例以及最常引用的来源。
结果
接受培训的眼科医生、ChatGPT、必应聊天和WebMD症状检查器在前3条建议中分别有42例(95%)、41例(