Suppr超能文献

通过检索增强生成提高大语言模型在糖尿病教育中的性能:比较研究

Enhancement of the Performance of Large Language Models in Diabetes Education through Retrieval-Augmented Generation: Comparative Study.

作者信息

Wang Dingqiao, Liang Jiangbo, Ye Jinguo, Li Jingni, Li Jingpeng, Zhang Qikai, Hu Qiuling, Pan Caineng, Wang Dongliang, Liu Zhong, Shi Wen, Shi Danli, Li Fei, Qu Bo, Zheng Yingfeng

机构信息

State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China.

Research Centre for SHARP Vision, The Hong Kong Polytechnic University, Hong Kong, China.

出版信息

J Med Internet Res. 2024 Nov 8;26:e58041. doi: 10.2196/58041.

Abstract

BACKGROUND

Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmented Information System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries.

OBJECTIVE

This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.

METHODS

The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval, summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracy and comprehensiveness and by patients for understandability.

RESULTS

The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 base LLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurate responses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by 0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.

CONCLUSIONS

The RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge.

摘要

背景

大语言模型(LLMs)在处理临床信息方面表现出先进的性能。然而,市面上的大语言模型缺乏专业医学知识,仍然容易产生不准确的信息。鉴于糖尿病患者需要自我管理,他们通常会在网上寻求信息。我们引入了增强检索信息系统(RISE)框架,并评估其在增强大语言模型以提供糖尿病相关问题准确回答方面的性能。

目的

本研究旨在评估RISE框架(一种信息检索和增强工具)提高大语言模型性能以准确、安全地回答糖尿病相关问题的潜力。

方法

RISE是一个创新的检索增强框架,包括4个步骤:重写查询、信息检索、总结和执行。我们使用一组43个常见的糖尿病相关问题,分别评估了3个基础大语言模型(GPT-4、Anthropic Claude 2、谷歌巴德)及其RISE增强版本。由临床医生评估回答的准确性和全面性,由患者评估回答的可理解性。

结果

RISE的整合显著提高了所有3个基础大语言模型回答的准确性和全面性。平均而言,使用RISE后准确回答的百分比提高了12%(15/129)。具体而言,GPT-4的准确回答率提高了7%(3/43),Claude 2提高了19%(8/43),谷歌巴德提高了9%(4/43)。该框架还增强了回答的全面性,平均得分提高了0.44(标准差0.10)。可理解性平均也提高了0.19(标准差0.13)。数据收集于2023年9月30日至2024年2月5日进行。

结论

RISE显著提高了大语言模型在回答糖尿病相关问题方面的性能,增强了准确性、全面性和可理解性。这些改进对RISE未来在患者教育和慢性病自我管理中的作用具有至关重要的意义,有助于缓解医疗资源压力并提高公众的医学知识意识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/2118df4be0e5/jmir_v26i1e58041_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验