使用定制聊天机器人对眼科急诊分诊系统进行性能分析

Performance analysis of an emergency triage system in ophthalmology using a customized CHATBOT.

作者信息

Schumacher Inès, Ferro Desideri Lorenzo, Bühler Virginie Manuela Marie, Sagurski Nicola, Subhi Yousif, Bhardwaj Gaurav, Roth Janice, Anguita Rodrigo

机构信息

Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland.

Department of Ophthalmology, Rigshospitalet, Glostrup, Denmark.

出版信息

Digit Health. 2025 May 11;11:20552076251320298. doi: 10.1177/20552076251320298. eCollection 2025 Jan-Dec.

DOI:10.1177/20552076251320298

PMID:40357425

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12066865/

Abstract

PURPOSE

To evaluate the performance of a custom ChatGPT-based chatbot in triaging ophthalmic emergencies compared to trained ophthalmologists.

METHODS

One hundred hypothetical ophthalmic cases were created based on actual patient data from an ophthalmic emergency department, including details such as age, symptoms and medical history. Three experienced ophthalmologists independently graded these cases using a four-tier severity scale, ranging from Grade 1 (immediate care required) to Grade 4 (non-urgent care). A customized version of ChatGPT was developed to perform the same grading task. Inter-rater agreement was measured between the chatbot and the ophthalmologists, as well as among all human graders.

RESULTS

The chatbot demonstrated substantial agreement with the ophthalmologists, achieving Cohen's kappa scores of 0.737, 0.749 and 0.751, respectively. The highest agreement was between ophthalmologist 3 and the chatbot (κ = 0.751). Fleiss' kappa for overall agreement among all graders was 0.79, indicating substantial agreement. The Kruskal-Wallis test showed no statistically significant differences in the distribution of grades assigned by the chatbot and the ophthalmologists ( = 0.967). Bootstrap analysis revealed no significant difference in kappa values between the chatbot and human graders ( = 0.572, 95% CI -0.163 to 0.072).

CONCLUSIONS

The study demonstrates that a customized chatbot can perform ophthalmic triage with a level of accuracy comparable to that of trained ophthalmologists. This suggests that AI-assisted triage could be a valuable tool in emergency departments, potentially enhancing clinical workflows and reducing waiting times while maintaining high standards of patient care.

摘要

目的

评估基于定制ChatGPT的聊天机器人在眼科急诊分诊方面与训练有素的眼科医生相比的表现。

方法

根据眼科急诊科的实际患者数据创建了100个假设的眼科病例，包括年龄、症状和病史等细节。三位经验丰富的眼科医生使用从1级（需要立即治疗）到4级（非紧急治疗）的四级严重程度量表对这些病例进行独立分级。开发了一个定制版的ChatGPT来执行相同的分级任务。测量了聊天机器人与眼科医生之间以及所有人类分级者之间的评分者间一致性。

结果

聊天机器人与眼科医生表现出高度一致性，Cohen's kappa分数分别为0.737、0.749和0.751。眼科医生3与聊天机器人之间的一致性最高（κ = 0.751）。所有分级者之间总体一致性的Fleiss' kappa为0.79，表明高度一致。Kruskal-Wallis检验显示，聊天机器人和眼科医生分配的等级分布没有统计学上的显著差异（= 0.967）。Bootstrap分析显示，聊天机器人与人类分级者之间的kappa值没有显著差异（= 0.572，95% CI -0.163至0.072）。

结论

该研究表明，定制的聊天机器人在眼科分诊方面的准确性可与训练有素的眼科医生相媲美。这表明人工智能辅助分诊可能是急诊科的一个有价值的工具，有可能改善临床工作流程并减少等待时间，同时保持高标准的患者护理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f32/12066865/c338b5942c50/10.1177_20552076251320298-fig1.jpg

相似文献

Performance analysis of an emergency triage system in ophthalmology using a customized CHATBOT.使用定制聊天机器人对眼科急诊分诊系统进行性能分析

Digit Health. 2025 May 11;11:20552076251320298. doi: 10.1177/20552076251320298. eCollection 2025 Jan-Dec.

Assessing the precision of artificial intelligence in ED triage decisions: Insights from a study with ChatGPT.评估人工智能在急诊分诊决策中的精准度：来自一项与 ChatGPT 合作研究的洞察。

Am J Emerg Med. 2024 Apr;78:170-175. doi: 10.1016/j.ajem.2024.01.037. Epub 2024 Jan 24.

Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.眼科医生与大型语言模型聊天机器人对在线患者眼部护理问题的回复比较。

JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较：大型语言模型、ChatGPT 和未经训练的急诊医生：一项对比研究。

J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.

Triage of Patient Messages Sent to the Eye Clinic via the Electronic Medical Record: A Comparative Study on AI and Human Triage Performance.通过电子病历发送至眼科诊所的患者信息的分诊：人工智能与人工分诊表现的比较研究

J Clin Med. 2025 Mar 31;14(7):2395. doi: 10.3390/jcm14072395.

ChatGPT-supported patient triage with voice commands in the emergency department: A prospective multicenter study.急诊科中基于语音指令的ChatGPT支持的患者分诊：一项前瞻性多中心研究。

Am J Emerg Med. 2025 Apr 17;94:63-70. doi: 10.1016/j.ajem.2025.04.040.

Artificial intelligence chatbot performance in triage of ophthalmic conditions.人工智能聊天机器人在眼科疾病分诊中的表现。

Can J Ophthalmol. 2024 Aug;59(4):e301-e308. doi: 10.1016/j.jcjo.2023.07.016. Epub 2023 Aug 9.

Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale.使用韩国预检和 acuity 量表时 ChatGPT 在急诊科执行分诊任务的可靠性。

Digit Health. 2024 Jan 17;10:20552076241227132. doi: 10.1177/20552076241227132. eCollection 2024 Jan-Dec.

Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study.基于开放获取自然语言处理的聊天机器人应用程序（ChatGPT）的急诊分诊预测性能：一项基于场景的初步横断面研究。

Turk J Emerg Med. 2023 Jun 26;23(3):156-161. doi: 10.4103/tjem.tjem_79_23. eCollection 2023 Jul-Sep.

Assessment of a Large Language Model's Responses to Questions and Cases About Glaucoma and Retina Management.评估大型语言模型对青光眼和视网膜管理相关问题和病例的回答。

JAMA Ophthalmol. 2024 Apr 1;142(4):371-375. doi: 10.1001/jamaophthalmol.2023.6917.

本文引用的文献

Custom GPTs Enhancing Performance and Evidence Compared with GPT-3.5, GPT-4, and GPT-4o? A Study on the Emergency Medicine Specialist Examination.与GPT-3.5、GPT-4和GPT-4o相比，定制生成式预训练变换器（Custom GPTs）在提升性能和证据方面如何？一项关于急诊医学专科考试的研究。

Healthcare (Basel). 2024 Aug 30;12(17):1726. doi: 10.3390/healthcare12171726.

Artificial intelligence derived large language model in decision-making process in uveitis.人工智能衍生的大语言模型在葡萄膜炎决策过程中的应用

Int J Retina Vitreous. 2024 Sep 11;10(1):63. doi: 10.1186/s40942-024-00581-1.

Assessing large language models' accuracy in providing patient support for choroidal melanoma.评估大型语言模型在为脉络膜黑色素瘤患者提供支持方面的准确性。

Eye (Lond). 2024 Nov;38(16):3113-3117. doi: 10.1038/s41433-024-03231-w. Epub 2024 Jul 13.

[ChatGPT and the German board examination for ophthalmology: an evaluation].[ChatGPT与德国眼科医师资格考试：一项评估]

Ophthalmologie. 2024 Jul;121(7):554-564. doi: 10.1007/s00347-024-02046-0. Epub 2024 May 27.

Performance of three artificial intelligence chatbots on Ophthalmic Knowledge Assessment Program materials.三款人工智能聊天机器人在眼科知识评估项目材料方面的表现。

Can J Ophthalmol. 2024 Aug;59(4):e380-e381. doi: 10.1016/j.jcjo.2024.01.011. Epub 2024 Feb 23.

Exploring Diagnostic Precision and Triage Proficiency: A Comparative Study of GPT-4 and Bard in Addressing Common Ophthalmic Complaints.探索诊断准确性和分诊能力：GPT-4与Bard处理常见眼科疾病主诉的比较研究

Bioengineering (Basel). 2024 Jan 26;11(2):120. doi: 10.3390/bioengineering11020120.

A Quantitative Assessment of ChatGPT as a Neurosurgical Triaging Tool.ChatGPT 在神经外科分诊工具中的定量评估。

Neurosurgery. 2024 Aug 1;95(2):487-495. doi: 10.1227/neu.0000000000002867. Epub 2024 Feb 14.

Am J Emerg Med. 2024 Apr;78:170-175. doi: 10.1016/j.ajem.2024.01.037. Epub 2024 Jan 24.

Twelve tips on creating and using custom GPTs to enhance health professions education.关于创建和使用定制 GPT 以增强健康职业教育的 12 点建议。

Med Teach. 2024 Jun;46(6):752-756. doi: 10.1080/0142159X.2024.2305365. Epub 2024 Jan 29.

"Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration".人工智能衍生的大语言模型在年龄相关性黄斑变性患者中的应用及准确性

Int J Retina Vitreous. 2023 Nov 18;9(1):71. doi: 10.1186/s40942-023-00511-7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用定制聊天机器人对眼科急诊分诊系统进行性能分析

Performance analysis of an emergency triage system in ophthalmology using a customized CHATBOT.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献