• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型与英国国家医疗服务体系111在线紧急眼科分诊的对比分析

"Comparative analysis of large language models against the NHS 111 online triaging for emergency ophthalmology".

作者信息

Khan Shaheryar Ahmed, Gunasekera Chrishan

机构信息

Ophthalmology Department, Moorfields Eye Hospital, London, UK.

Ophthalmology Department, Norfolk & Norwich University Hospital, Norwich, UK.

出版信息

Eye (Lond). 2025 May;39(7):1301-1308. doi: 10.1038/s41433-025-03605-8. Epub 2025 Jan 21.

DOI:10.1038/s41433-025-03605-8
PMID:39838136
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12043832/
Abstract

BACKGROUND

This study presents a comprehensive evaluation of the performance of various large language models in generating responses for ophthalmology emergencies and compares their accuracy with the established United Kingdom's National Health Service 111 online system.

METHODS

We included 21 ophthalmology-related emergency scenario questions from the NHS 111 triaging algorithm. These questions were based on four different ophthalmology emergency themes as laid out in the NHS 111 algorithm. Responses generated from NHS 111 online, were compared to different LLM-chatbots responses to determine the accuracy of LLM responses. We included a range of models including ChatGPT-3.5, Google Bard, Bing Chat, and ChatGPT-4.0. The accuracy of each LLM-chatbot response was compared against the NHS 111 Triage using a two-prompt strategy. Answers were graded as following: -2 graded as "Very poor", -1 as "Poor", O as "No response", 1 as "Good", 2 as "Very good" and 3 graded as "Excellent".

RESULTS

Overall LLMs' attained a good accuracy in this study compared against the NHS 111 responses. The score of ≥1 graded as "Good" was achieved by 93% responses of all LLMs. This refers to at least part of this answer having correct information as well as absence of any wrong information. There was no marked difference and very similar results seen overall on both prompts.

CONCLUSIONS

The high accuracy and safety observed in LLM responses support their potential as effective tools for providing timely information and guidance to patients. LLMs hold promise in enhancing patient care and healthcare accessibility in digital age.

摘要

背景

本研究全面评估了各种大语言模型在生成眼科急诊回复方面的性能,并将其准确性与英国国家医疗服务体系111在线系统进行比较。

方法

我们纳入了英国国家医疗服务体系111分诊算法中的21个与眼科相关的急诊场景问题。这些问题基于英国国家医疗服务体系111算法中列出的四个不同的眼科急诊主题。将英国国家医疗服务体系111在线生成的回复与不同的大语言模型聊天机器人的回复进行比较,以确定大语言模型回复的准确性。我们纳入了一系列模型,包括ChatGPT-3.5、谷歌巴德、必应聊天和ChatGPT-4.0。使用双提示策略将每个大语言模型聊天机器人回复的准确性与英国国家医疗服务体系111分诊进行比较。答案的评分如下:-2评为“非常差”,-1评为“差”,0评为“无回复”,1评为“好”,2评为“非常好”,3评为“优秀”。

结果

与英国国家医疗服务体系111的回复相比,在本研究中,大语言模型总体上达到了较高的准确性。所有大语言模型93%的回复获得了≥1分(评为“好”)。这意味着该答案至少部分包含正确信息且无任何错误信息。在两个提示下,总体上没有明显差异,结果非常相似。

结论

在大语言模型回复中观察到的高准确性和安全性支持了它们作为向患者提供及时信息和指导的有效工具的潜力。在数字时代,大语言模型有望改善患者护理并提高医疗服务的可及性。

相似文献

1
"Comparative analysis of large language models against the NHS 111 online triaging for emergency ophthalmology".大型语言模型与英国国家医疗服务体系111在线紧急眼科分诊的对比分析
Eye (Lond). 2025 May;39(7):1301-1308. doi: 10.1038/s41433-025-03605-8. Epub 2025 Jan 21.
2
Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.成人全身麻醉后预防术后恶心呕吐的药物:网状Meta分析
Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.
3
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
7
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
8
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
9
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
10
Surgical interventions for treating extracapsular hip fractures in older adults: a network meta-analysis.老年人髋关节囊外骨折的手术干预:一项网络荟萃分析。
Cochrane Database Syst Rev. 2022 Feb 10;2(2):CD013405. doi: 10.1002/14651858.CD013405.pub2.

本文引用的文献

1
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
2
The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.GPT-3 人工智能模型的诊断和分诊准确性:一项观察性研究。
Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.
3
Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients.人工智能 ChatGPT 在为患者提供眼科疾病信息和管理方面的可靠性和准确性。
Eye (Lond). 2024 May;38(7):1368-1373. doi: 10.1038/s41433-023-02906-0. Epub 2024 Jan 20.
4
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
5
Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams.皇家眼科医学院院士资格考试中大型语言模型的对比分析
Eye (Lond). 2023 Dec;37(17):3530-3533. doi: 10.1038/s41433-023-02563-3. Epub 2023 May 9.
6
Foundation models for generalist medical artificial intelligence.通用型医学人工智能的基础模型。
Nature. 2023 Apr;616(7956):259-265. doi: 10.1038/s41586-023-05881-4. Epub 2023 Apr 12.
7
International publication trends in the application of artificial intelligence in ophthalmology research: an updated bibliometric analysis.人工智能在眼科研究中应用的国际出版趋势:最新文献计量分析
Ann Transl Med. 2023 Mar 15;11(5):219. doi: 10.21037/atm-22-3773. Epub 2023 Mar 9.
8
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.ChatGPT在医学教育、研究与实践中的应用:对其前景与合理担忧的系统评价
Healthcare (Basel). 2023 Mar 19;11(6):887. doi: 10.3390/healthcare11060887.
9
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
10
The potential of artificial intelligence to improve patient safety: a scoping review.人工智能改善患者安全的潜力:一项范围综述。
NPJ Digit Med. 2021 Mar 19;4(1):54. doi: 10.1038/s41746-021-00423-6.