大型语言模型与英国国家医疗服务体系111在线紧急眼科分诊的对比分析

"Comparative analysis of large language models against the NHS 111 online triaging for emergency ophthalmology".

作者信息

Khan Shaheryar Ahmed, Gunasekera Chrishan

机构信息

Ophthalmology Department, Moorfields Eye Hospital, London, UK.

Ophthalmology Department, Norfolk & Norwich University Hospital, Norwich, UK.

出版信息

Eye (Lond). 2025 May;39(7):1301-1308. doi: 10.1038/s41433-025-03605-8. Epub 2025 Jan 21.

DOI:10.1038/s41433-025-03605-8

PMID:39838136

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12043832/

Abstract

BACKGROUND

This study presents a comprehensive evaluation of the performance of various large language models in generating responses for ophthalmology emergencies and compares their accuracy with the established United Kingdom's National Health Service 111 online system.

METHODS

We included 21 ophthalmology-related emergency scenario questions from the NHS 111 triaging algorithm. These questions were based on four different ophthalmology emergency themes as laid out in the NHS 111 algorithm. Responses generated from NHS 111 online, were compared to different LLM-chatbots responses to determine the accuracy of LLM responses. We included a range of models including ChatGPT-3.5, Google Bard, Bing Chat, and ChatGPT-4.0. The accuracy of each LLM-chatbot response was compared against the NHS 111 Triage using a two-prompt strategy. Answers were graded as following: -2 graded as "Very poor", -1 as "Poor", O as "No response", 1 as "Good", 2 as "Very good" and 3 graded as "Excellent".

RESULTS

Overall LLMs' attained a good accuracy in this study compared against the NHS 111 responses. The score of ≥1 graded as "Good" was achieved by 93% responses of all LLMs. This refers to at least part of this answer having correct information as well as absence of any wrong information. There was no marked difference and very similar results seen overall on both prompts.

CONCLUSIONS

The high accuracy and safety observed in LLM responses support their potential as effective tools for providing timely information and guidance to patients. LLMs hold promise in enhancing patient care and healthcare accessibility in digital age.

摘要

背景

本研究全面评估了各种大语言模型在生成眼科急诊回复方面的性能，并将其准确性与英国国家医疗服务体系111在线系统进行比较。

方法

我们纳入了英国国家医疗服务体系111分诊算法中的21个与眼科相关的急诊场景问题。这些问题基于英国国家医疗服务体系111算法中列出的四个不同的眼科急诊主题。将英国国家医疗服务体系111在线生成的回复与不同的大语言模型聊天机器人的回复进行比较，以确定大语言模型回复的准确性。我们纳入了一系列模型，包括ChatGPT-3.5、谷歌巴德、必应聊天和ChatGPT-4.0。使用双提示策略将每个大语言模型聊天机器人回复的准确性与英国国家医疗服务体系111分诊进行比较。答案的评分如下：-2评为“非常差”，-1评为“差”，0评为“无回复”，1评为“好”，2评为“非常好”，3评为“优秀”。

结果

与英国国家医疗服务体系111的回复相比，在本研究中，大语言模型总体上达到了较高的准确性。所有大语言模型93%的回复获得了≥1分（评为“好”）。这意味着该答案至少部分包含正确信息且无任何错误信息。在两个提示下，总体上没有明显差异，结果非常相似。

结论

在大语言模型回复中观察到的高准确性和安全性支持了它们作为向患者提供及时信息和指导的有效工具的潜力。在数字时代，大语言模型有望改善患者护理并提高医疗服务的可及性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

大型语言模型与英国国家医疗服务体系111在线紧急眼科分诊的对比分析

"Comparative analysis of large language models against the NHS 111 online triaging for emergency ophthalmology".

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

大型语言模型与英国国家医疗服务体系111在线紧急眼科分诊的对比分析

"Comparative analysis of large language models against the NHS 111 online triaging for emergency ophthalmology".

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献