Suppr超能文献

用于患者咨询和医学教育的EyeGPT:一种眼科大语言模型的开发与验证

EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model.

作者信息

Chen Xiaolan, Zhao Ziwei, Zhang Weiyi, Xu Pusheng, Wu Yue, Xu Mingpu, Gao Le, Li Yinwen, Shang Xianwen, Shi Danli, He Mingguang

机构信息

School of Optometry, The Hong Kong Polytechnic University, Hong Kong, China.

State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China.

出版信息

J Med Internet Res. 2024 Dec 11;26:e60063. doi: 10.2196/60063.

Abstract

BACKGROUND

Large language models (LLMs) have the potential to enhance clinical flow and improve medical education, but they encounter challenges related to specialized knowledge in ophthalmology.

OBJECTIVE

This study aims to enhance ophthalmic knowledge by refining a general LLM into an ophthalmology-specialized assistant for patient inquiries and medical education.

METHODS

We transformed Llama2 into an ophthalmology-specialized LLM, termed EyeGPT, through the following 3 strategies: prompt engineering for role-playing, fine-tuning with publicly available data sets filtered for eye-specific terminology (83,919 samples), and retrieval-augmented generation leveraging a medical database and 14 ophthalmology textbooks. The efficacy of various EyeGPT variants was evaluated by 4 board-certified ophthalmologists through comprehensive use of 120 diverse category questions in both simple and complex question-answering scenarios. The performance of the best EyeGPT model was then compared with that of the unassisted human physician group and the EyeGPT+human group. We proposed 4 metrics for assessment: accuracy, understandability, trustworthiness, and empathy. The proportion of hallucinations was also reported.

RESULTS

The best fine-tuned model significantly outperformed the original Llama2 model at providing informed advice (mean 9.30, SD 4.42 vs mean 13.79, SD 5.70; P<.001) and mitigating hallucinations (97/120, 80.8% vs 53/120, 44.2%, P<.001). Incorporating information retrieval from reliable sources, particularly ophthalmology textbooks, further improved the model's response compared with solely the best fine-tuned model (mean 13.08, SD 5.43 vs mean 15.14, SD 4.64; P=.001) and reduced hallucinations (71/120, 59.2% vs 57/120, 47.4%, P=.02). Subgroup analysis revealed that EyeGPT showed robustness across common diseases, with consistent performance across different users and domains. Among the variants, the model integrating fine-tuning and book retrieval ranked highest, closely followed by the combination of fine-tuning and the manual database, standalone fine-tuning, and pure role-playing methods. EyeGPT demonstrated competitive capabilities in understandability and empathy when compared with human ophthalmologists. With the assistance of EyeGPT, the performance of the ophthalmologist was notably enhanced.

CONCLUSIONS

We pioneered and introduced EyeGPT by refining a general domain LLM and conducted a comprehensive comparison and evaluation of different strategies to develop an ophthalmology-specific assistant. Our results highlight EyeGPT's potential to assist ophthalmologists and patients in medical settings.

摘要

背景

大语言模型(LLMs)有潜力提升临床流程并改善医学教育,但在眼科专业知识方面面临挑战。

目的

本研究旨在通过将通用大语言模型优化为用于患者咨询和医学教育的眼科专业助手来增强眼科知识。

方法

我们通过以下三种策略将Llama2转变为眼科专业大语言模型,即EyeGPT:用于角色扮演的提示工程、使用针对眼部特定术语筛选的公开数据集(83,919个样本)进行微调,以及利用医学数据库和14本眼科教科书进行检索增强生成。4名获得委员会认证的眼科医生通过在简单和复杂问答场景中综合使用120个不同类别的问题,对各种EyeGPT变体的功效进行了评估。然后将最佳EyeGPT模型的表现与无辅助的人类医生组以及EyeGPT + 人类组的表现进行比较。我们提出了4个评估指标:准确性、可理解性、可信度和同理心。还报告了幻觉的比例。

结果

最佳微调模型在提供明智建议方面显著优于原始Llama2模型(均值9.30,标准差4.42对均值13.79,标准差5.70;P <.001),并减少了幻觉(97/120,80.8%对53/120,44.2%,P <.001)。与仅使用最佳微调模型相比,纳入来自可靠来源(特别是眼科教科书)的信息检索进一步改善了模型的回答(均值13.08,标准差5.43对均值15.14,标准差4.64;P = 0.001)并减少了幻觉(71/120,59.2%对57/120,47.4%,P = 0.02)。亚组分析显示,EyeGPT在常见疾病方面表现稳健,在不同用户和领域中表现一致。在这些变体中,整合微调与书籍检索的模型排名最高,紧随其后的是微调与手动数据库的组合、单独微调以及纯角色扮演方法。与人类眼科医生相比,EyeGPT在可理解性和同理心方面展现出了有竞争力的能力。在EyeGPT的协助下,眼科医生的表现得到了显著提升。

结论

我们通过优化通用领域大语言模型开创并引入了EyeGPT,并对不同策略进行了全面比较和评估,以开发出一款眼科专用助手。我们的结果凸显了EyeGPT在医疗环境中协助眼科医生和患者的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5113/11669878/3a442d2128b1/jmir_v26i1e60063_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验