Schumacher Inès, Ferro Desideri Lorenzo, Bühler Virginie Manuela Marie, Sagurski Nicola, Subhi Yousif, Bhardwaj Gaurav, Roth Janice, Anguita Rodrigo
Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland.
Department of Ophthalmology, Rigshospitalet, Glostrup, Denmark.
Digit Health. 2025 May 11;11:20552076251320298. doi: 10.1177/20552076251320298. eCollection 2025 Jan-Dec.
To evaluate the performance of a custom ChatGPT-based chatbot in triaging ophthalmic emergencies compared to trained ophthalmologists.
One hundred hypothetical ophthalmic cases were created based on actual patient data from an ophthalmic emergency department, including details such as age, symptoms and medical history. Three experienced ophthalmologists independently graded these cases using a four-tier severity scale, ranging from Grade 1 (immediate care required) to Grade 4 (non-urgent care). A customized version of ChatGPT was developed to perform the same grading task. Inter-rater agreement was measured between the chatbot and the ophthalmologists, as well as among all human graders.
The chatbot demonstrated substantial agreement with the ophthalmologists, achieving Cohen's kappa scores of 0.737, 0.749 and 0.751, respectively. The highest agreement was between ophthalmologist 3 and the chatbot (κ = 0.751). Fleiss' kappa for overall agreement among all graders was 0.79, indicating substantial agreement. The Kruskal-Wallis test showed no statistically significant differences in the distribution of grades assigned by the chatbot and the ophthalmologists ( = 0.967). Bootstrap analysis revealed no significant difference in kappa values between the chatbot and human graders ( = 0.572, 95% CI -0.163 to 0.072).
The study demonstrates that a customized chatbot can perform ophthalmic triage with a level of accuracy comparable to that of trained ophthalmologists. This suggests that AI-assisted triage could be a valuable tool in emergency departments, potentially enhancing clinical workflows and reducing waiting times while maintaining high standards of patient care.
评估基于定制ChatGPT的聊天机器人在眼科急诊分诊方面与训练有素的眼科医生相比的表现。
根据眼科急诊科的实际患者数据创建了100个假设的眼科病例,包括年龄、症状和病史等细节。三位经验丰富的眼科医生使用从1级(需要立即治疗)到4级(非紧急治疗)的四级严重程度量表对这些病例进行独立分级。开发了一个定制版的ChatGPT来执行相同的分级任务。测量了聊天机器人与眼科医生之间以及所有人类分级者之间的评分者间一致性。
聊天机器人与眼科医生表现出高度一致性,Cohen's kappa分数分别为0.737、0.749和0.751。眼科医生3与聊天机器人之间的一致性最高(κ = 0.751)。所有分级者之间总体一致性的Fleiss' kappa为0.79,表明高度一致。Kruskal-Wallis检验显示,聊天机器人和眼科医生分配的等级分布没有统计学上的显著差异(= 0.967)。Bootstrap分析显示,聊天机器人与人类分级者之间的kappa值没有显著差异(= 0.572,95% CI -0.163至0.072)。
该研究表明,定制的聊天机器人在眼科分诊方面的准确性可与训练有素的眼科医生相媲美。这表明人工智能辅助分诊可能是急诊科的一个有价值的工具,有可能改善临床工作流程并减少等待时间,同时保持高标准的患者护理。