• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

当前大语言模型及检索增强生成模型在确定慢性肾脏病饮食原则方面的准确性

Accuracy of Current Large Language Models and the Retrieval-Augmented Generation Model in Determining Dietary Principles in Chronic Kidney Disease.

作者信息

Gençer Bingöl Feray, Ağagündüz Duygu, Bingol Mustafa Can

机构信息

Assistant Professor, Department of Nutrition and Dietetics, Faculty of Health Science, Burdur Mehmet Akif Ersoy University, Burdur, Türkiye.

Associate Professor, Department of Nutrition and Dietetics, Faculty of Health Science, Gazi University, Ankara, Türkiye.

出版信息

J Ren Nutr. 2025 May;35(3):401-409. doi: 10.1053/j.jrn.2025.01.004. Epub 2025 Jan 24.

DOI:10.1053/j.jrn.2025.01.004
PMID:39864474
Abstract

OBJECTIVE

Large language models (LLMs) have emerged as powerful tools with significant potential for quickly accessing information in the nutrition and health, as in many fields. Retrieval-augmented generation (RAG) has been included among artificial intelligence (AI) powered chatbot structures as a framework developed to increase the accuracy and ability of LLMs. This study aimed to evaluate the accuracy of LLMs (Generative Pre-trained Transformer 4, Gemini, and Llama) and RAG in determining dietary principles in chronic kidney disease.

DESIGN AND METHODS

The nutrition guideline published by the National Kidney Foundation in 2020 was used as an external information source in developed RAG model. Answers were obtained using 12 medical nutritional therapy prompts for chronic kidney disease by four chatbots. The accuracy of the 48 answers generated by the chatbots was evaluated with a 5-point Likert scale.

RESULTS

The results showed that Gemini and RAG had the highest accuracy scores (median: 4.0), followed by Generative Pre-trained Transformer 4 (median: 2.5) and Llama (median: 1.5), respectively. When the accuracy scores were examined between the two chatbots, a significant difference was detected between all groups except Gemini and RAG.

CONCLUSION

These chatbots produced both completely correct answers and false information with potentially harmful clinical outcomes. Customization of LLMs in specific areas such as nutrition or the development of a nutrition-specific RAG framework by improving LLM structures with current guidelines and articles may be an important strategy to increase the accuracy of AI powered chatbots.

摘要

目的

与许多领域一样,大语言模型(LLMs)已成为强大的工具,在营养与健康领域快速获取信息方面具有巨大潜力。检索增强生成(RAG)已被纳入人工智能(AI)驱动的聊天机器人结构中,作为一种为提高大语言模型的准确性和能力而开发的框架。本研究旨在评估大语言模型(生成式预训练变换器4、Gemini和Llama)和RAG在确定慢性肾脏病饮食原则方面的准确性。

设计与方法

2020年美国国家肾脏基金会发布的营养指南被用作已开发的RAG模型的外部信息源。通过四个聊天机器人,使用12个针对慢性肾脏病的医学营养治疗提示获得答案。聊天机器人生成的48个答案的准确性用5级李克特量表进行评估。

结果

结果显示,Gemini和RAG的准确性得分最高(中位数:4.0),其次是生成式预训练变换器4(中位数:2.5)和Llama(中位数:1.5)。在检查两个聊天机器人之间的准确性得分时,除Gemini和RAG外,所有组之间均检测到显著差异。

结论

这些聊天机器人既产生了完全正确的答案,也产生了可能具有有害临床后果的错误信息。在营养等特定领域对大语言模型进行定制,或者通过利用当前指南和文章改进大语言模型结构来开发特定于营养的RAG框架,可能是提高人工智能驱动的聊天机器人准确性的重要策略。

相似文献

1
Accuracy of Current Large Language Models and the Retrieval-Augmented Generation Model in Determining Dietary Principles in Chronic Kidney Disease.当前大语言模型及检索增强生成模型在确定慢性肾脏病饮食原则方面的准确性
J Ren Nutr. 2025 May;35(3):401-409. doi: 10.1053/j.jrn.2025.01.004. Epub 2025 Jan 24.
2
Optimizing theranostics chatbots with context-augmented large language models.利用上下文增强大语言模型优化治疗诊断聊天机器人。
Theranostics. 2025 Apr 21;15(12):5693-5704. doi: 10.7150/thno.107757. eCollection 2025.
3
Custom Large Language Models Improve Accuracy: Comparing Retrieval Augmented Generation and Artificial Intelligence Agents to Noncustom Models for Evidence-Based Medicine.定制大语言模型提高准确性:将检索增强生成和人工智能代理与非定制模型在循证医学方面进行比较
Arthroscopy. 2025 Mar;41(3):565-573.e6. doi: 10.1016/j.arthro.2024.10.042. Epub 2024 Nov 7.
4
Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE.语义临床人工智能与原生大语言模型在美国医师执照考试中的表现对比
JAMA Netw Open. 2025 Apr 1;8(4):e256359. doi: 10.1001/jamanetworkopen.2025.6359.
5
Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models.改善膳食补充剂信息检索:利用大语言模型开发检索增强生成系统
J Med Internet Res. 2025 Mar 19;27:e67677. doi: 10.2196/67677.
6
Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval Augmented Generation.通过使用检索增强生成的大语言模型改进自动深度表型分析
medRxiv. 2024 Dec 2:2024.12.01.24318253. doi: 10.1101/2024.12.01.24318253.
7
The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.人工智能聊天机器人大型语言模型在解决骨骼生物学和骨骼健康问题方面的表现。
J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.
8
Assessing Retrieval-Augmented Large Language Model Performance in Emergency Department ICD-10-CM Coding Compared to Human Coders.与人工编码员相比,评估检索增强型大语言模型在急诊科ICD-10-CM编码中的性能。
medRxiv. 2024 Oct 17:2024.10.15.24315526. doi: 10.1101/2024.10.15.24315526.
9
Evaluation of a context-aware chatbot using retrieval-augmented generation for answering clinical questions on medication-related osteonecrosis of the jaw.使用检索增强生成技术评估上下文感知聊天机器人,以回答关于药物性颌骨坏死的临床问题。
J Craniomaxillofac Surg. 2025 Apr;53(4):355-360. doi: 10.1016/j.jcms.2024.12.009. Epub 2025 Jan 10.
10
Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model.使用检索增强语言模型提高GPT-3/4在生物医学数据上的结果准确性。
PLOS Digit Health. 2024 Aug 21;3(8):e0000568. doi: 10.1371/journal.pdig.0000568. eCollection 2024 Aug.

引用本文的文献

1
Large language models in nephrology: applications and challenges in chronic kidney disease management.肾脏病学中的大语言模型:慢性肾脏病管理中的应用与挑战
Ren Fail. 2025 Dec;47(1):2555686. doi: 10.1080/0886022X.2025.2555686. Epub 2025 Sep 7.