• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过检索增强生成提高大语言模型在糖尿病教育中的性能:比较研究

Enhancement of the Performance of Large Language Models in Diabetes Education through Retrieval-Augmented Generation: Comparative Study.

作者信息

Wang Dingqiao, Liang Jiangbo, Ye Jinguo, Li Jingni, Li Jingpeng, Zhang Qikai, Hu Qiuling, Pan Caineng, Wang Dongliang, Liu Zhong, Shi Wen, Shi Danli, Li Fei, Qu Bo, Zheng Yingfeng

机构信息

State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China.

Research Centre for SHARP Vision, The Hong Kong Polytechnic University, Hong Kong, China.

出版信息

J Med Internet Res. 2024 Nov 8;26:e58041. doi: 10.2196/58041.

DOI:10.2196/58041
PMID:39046096
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11584532/
Abstract

BACKGROUND

Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmented Information System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries.

OBJECTIVE

This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.

METHODS

The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval, summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracy and comprehensiveness and by patients for understandability.

RESULTS

The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 base LLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurate responses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by 0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.

CONCLUSIONS

The RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge.

摘要

背景

大语言模型(LLMs)在处理临床信息方面表现出先进的性能。然而,市面上的大语言模型缺乏专业医学知识,仍然容易产生不准确的信息。鉴于糖尿病患者需要自我管理,他们通常会在网上寻求信息。我们引入了增强检索信息系统(RISE)框架,并评估其在增强大语言模型以提供糖尿病相关问题准确回答方面的性能。

目的

本研究旨在评估RISE框架(一种信息检索和增强工具)提高大语言模型性能以准确、安全地回答糖尿病相关问题的潜力。

方法

RISE是一个创新的检索增强框架,包括4个步骤:重写查询、信息检索、总结和执行。我们使用一组43个常见的糖尿病相关问题,分别评估了3个基础大语言模型(GPT-4、Anthropic Claude 2、谷歌巴德)及其RISE增强版本。由临床医生评估回答的准确性和全面性,由患者评估回答的可理解性。

结果

RISE的整合显著提高了所有3个基础大语言模型回答的准确性和全面性。平均而言,使用RISE后准确回答的百分比提高了12%(15/129)。具体而言,GPT-4的准确回答率提高了7%(3/43),Claude 2提高了19%(8/43),谷歌巴德提高了9%(4/43)。该框架还增强了回答的全面性,平均得分提高了0.44(标准差0.10)。可理解性平均也提高了0.19(标准差0.13)。数据收集于2023年9月30日至2024年2月5日进行。

结论

RISE显著提高了大语言模型在回答糖尿病相关问题方面的性能,增强了准确性、全面性和可理解性。这些改进对RISE未来在患者教育和慢性病自我管理中的作用具有至关重要的意义,有助于缓解医疗资源压力并提高公众的医学知识意识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/06ed64d7f10a/jmir_v26i1e58041_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/2118df4be0e5/jmir_v26i1e58041_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/444edc8475a5/jmir_v26i1e58041_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/e6e8e7a07da8/jmir_v26i1e58041_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/06ed64d7f10a/jmir_v26i1e58041_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/2118df4be0e5/jmir_v26i1e58041_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/444edc8475a5/jmir_v26i1e58041_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/e6e8e7a07da8/jmir_v26i1e58041_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaa/11584532/06ed64d7f10a/jmir_v26i1e58041_fig4.jpg

相似文献

1
Enhancement of the Performance of Large Language Models in Diabetes Education through Retrieval-Augmented Generation: Comparative Study.通过检索增强生成提高大语言模型在糖尿病教育中的性能:比较研究
J Med Internet Res. 2024 Nov 8;26:e58041. doi: 10.2196/58041.
2
Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources.大型语言模型和减重手术患者教育:GPT-3.5、GPT-4、Bard 与在线机构资源的可读性比较分析。
Surg Endosc. 2024 May;38(5):2522-2532. doi: 10.1007/s00464-024-10720-2. Epub 2024 Mar 12.
3
Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study.使用检索增强大语言模型进行COVID-19事实核查:开发与可用性研究。
J Med Internet Res. 2025 Apr 30;27:e66098. doi: 10.2196/66098.
4
Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.比较分析 ChatGPT-3.5、ChatGPT-4.0 和谷歌巴德在近视防控方面的表现:大型语言模型的基准测试。
EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.
5
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.评估印度全国医预考用大型语言模型:GPT-3.5、GPT-4 和 Bard 的比较分析。
JMIR Med Educ. 2024 Feb 21;10:e51523. doi: 10.2196/51523.
6
Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models.改善膳食补充剂信息检索:利用大语言模型开发检索增强生成系统
J Med Internet Res. 2025 Mar 19;27:e67677. doi: 10.2196/67677.
7
Programming Chatbots Using Natural Language: Generating Cervical Spine MRI Impressions.使用自然语言编程聊天机器人:生成颈椎MRI影像报告
Cureus. 2024 Sep 14;16(9):e69410. doi: 10.7759/cureus.69410. eCollection 2024 Sep.
8
Custom Large Language Models Improve Accuracy: Comparing Retrieval Augmented Generation and Artificial Intelligence Agents to Noncustom Models for Evidence-Based Medicine.定制大语言模型提高准确性:将检索增强生成和人工智能代理与非定制模型在循证医学方面进行比较
Arthroscopy. 2025 Mar;41(3):565-573.e6. doi: 10.1016/j.arthro.2024.10.042. Epub 2024 Nov 7.
9
Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model.使用检索增强语言模型提高GPT-3/4在生物医学数据上的结果准确性。
PLOS Digit Health. 2024 Aug 21;3(8):e0000568. doi: 10.1371/journal.pdig.0000568. eCollection 2024 Aug.
10
Evaluating Artificial Intelligence-Driven Responses to Acute Liver Failure Queries: A Comparative Analysis Across Accuracy, Clarity, and Relevance.评估人工智能驱动的急性肝衰竭问题回答:准确性、清晰度和相关性的比较分析
Am J Gastroenterol. 2024 Dec 17. doi: 10.14309/ajg.0000000000003255.

引用本文的文献

1
Graph retrieval augmented large language models for facial phenotype associated rare genetic disease.用于面部表型相关罕见遗传病的图谱检索增强大语言模型
NPJ Digit Med. 2025 Aug 24;8(1):543. doi: 10.1038/s41746-025-01955-x.
2
Current Landscape and Future Directions Regarding Generative Large Language Models in Stroke Care: Scoping Review.中风护理中生成式大语言模型的当前现状与未来方向:范围综述
JMIR Med Inform. 2025 Aug 7;13:e76636. doi: 10.2196/76636.
3
A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.

本文引用的文献

1
Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model.使用检索增强语言模型提高GPT-3/4在生物医学数据上的结果准确性。
PLOS Digit Health. 2024 Aug 21;3(8):e0000568. doi: 10.1371/journal.pdig.0000568. eCollection 2024 Aug.
2
Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content.利用大语言模型改善患者就医机会和自我管理:专家生成内容与人工智能生成内容的评估者盲法比较
J Med Internet Res. 2024 Apr 25;26:e55847. doi: 10.2196/55847.
3
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.
牙种植学中大型语言模型的多维性能评估:ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较
BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.
4
Evaluation of a retrieval-augmented generation system using a Japanese Institutional Nuclear Medicine Manual and large language model-automated scoring.使用日本机构核医学手册和大语言模型自动评分对检索增强生成系统进行评估。
Radiol Phys Technol. 2025 Jul 19. doi: 10.1007/s12194-025-00941-y.
5
DeepSeek-R1 outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in bilingual complex ophthalmology reasoning.在双语复杂眼科推理方面,DeepSeek-R1的表现优于Gemini 2.0 Pro、OpenAI的o1和o3-mini。
Adv Ophthalmol Pract Res. 2025 May 9;5(3):189-195. doi: 10.1016/j.aopr.2025.05.001. eCollection 2025 Aug-Sep.
6
Retrieval augmented generation for large language models in healthcare: A systematic review.医疗保健领域大语言模型的检索增强生成:一项系统综述。
PLOS Digit Health. 2025 Jun 11;4(6):e0000877. doi: 10.1371/journal.pdig.0000877. eCollection 2025 Jun.
7
A Knowledge-Enhanced Platform (MetaSepsisKnowHub) for Retrieval Augmented Generation-Based Sepsis Heterogeneity and Personalized Management: Development Study.用于基于检索增强生成的脓毒症异质性和个性化管理的知识增强平台(MetaSepsisKnowHub):开发研究
J Med Internet Res. 2025 Jun 6;27:e67201. doi: 10.2196/67201.
8
Evaluating DeepResearch and DeepThink in anterior cruciate ligament surgery patient education: ChatGPT-4o excels in comprehensiveness, DeepSeek R1 leads in clarity and readability of orthopaedic information.评估DeepResearch和DeepThink在前交叉韧带手术患者教育中的作用:ChatGPT-4o在全面性方面表现出色,DeepSeek R1在骨科信息的清晰度和可读性方面领先。
Knee Surg Sports Traumatol Arthrosc. 2025 Jun 1. doi: 10.1002/ksa.12711.
9
Evaluating large language model workflows in clinical decision support for triage and referral and diagnosis.评估用于分诊、转诊和诊断的临床决策支持中的大语言模型工作流程。
NPJ Digit Med. 2025 May 9;8(1):263. doi: 10.1038/s41746-025-01684-1.
10
The Effectiveness of a Custom AI Chatbot for Type 2 Diabetes Mellitus Health Literacy: Development and Evaluation Study.定制人工智能聊天机器人对2型糖尿病健康素养的有效性:开发与评估研究
J Med Internet Res. 2025 May 5;27:e70131. doi: 10.2196/70131.
生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
4
Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications.将检索增强生成与大型语言模型在肾脏病学中的整合:推进实际应用。
Medicina (Kaunas). 2024 Mar 8;60(3):445. doi: 10.3390/medicina60030445.
5
Development of a liver disease-specific large language model chat interface using retrieval-augmented generation.使用检索增强生成技术开发肝脏疾病特异性大语言模型聊天界面。
Hepatology. 2024 Nov 1;80(5):1158-1168. doi: 10.1097/HEP.0000000000000834. Epub 2024 Mar 7.
6
Almanac - Retrieval-Augmented Language Models for Clinical Medicine.用于临床医学的年鉴检索增强语言模型。
NEJM AI. 2024 Feb;1(2). doi: 10.1056/aioa2300068. Epub 2024 Jan 25.
7
Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study.ChatGPT 在不同考试级别的眼科相关问题上的表现:观察性研究。
JMIR Med Educ. 2024 Jan 18;10:e50842. doi: 10.2196/50842.
8
Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature.Clinfo.ai:一个使用科学文献回答医学问题的开源检索增强型大型语言模型系统。
Pac Symp Biocomput. 2024;29:8-23.
9
The Future of Patient Education: AI-Driven Guide for Type 2 Diabetes.患者教育的未来:2型糖尿病的人工智能驱动指南
Cureus. 2023 Nov 16;15(11):e48919. doi: 10.7759/cureus.48919. eCollection 2023 Nov.
10
The Role of ChatGPT in the Advancement of Diagnosis, Management, and Prognosis of Cardiovascular and Cerebrovascular Disease.ChatGPT在心血管和脑血管疾病诊断、管理及预后评估中的作用
Healthcare (Basel). 2023 Nov 6;11(21):2906. doi: 10.3390/healthcare11212906.