Gençer Bingöl Feray, Ağagündüz Duygu, Bingol Mustafa Can
Assistant Professor, Department of Nutrition and Dietetics, Faculty of Health Science, Burdur Mehmet Akif Ersoy University, Burdur, Türkiye.
Associate Professor, Department of Nutrition and Dietetics, Faculty of Health Science, Gazi University, Ankara, Türkiye.
J Ren Nutr. 2025 May;35(3):401-409. doi: 10.1053/j.jrn.2025.01.004. Epub 2025 Jan 24.
Large language models (LLMs) have emerged as powerful tools with significant potential for quickly accessing information in the nutrition and health, as in many fields. Retrieval-augmented generation (RAG) has been included among artificial intelligence (AI) powered chatbot structures as a framework developed to increase the accuracy and ability of LLMs. This study aimed to evaluate the accuracy of LLMs (Generative Pre-trained Transformer 4, Gemini, and Llama) and RAG in determining dietary principles in chronic kidney disease.
The nutrition guideline published by the National Kidney Foundation in 2020 was used as an external information source in developed RAG model. Answers were obtained using 12 medical nutritional therapy prompts for chronic kidney disease by four chatbots. The accuracy of the 48 answers generated by the chatbots was evaluated with a 5-point Likert scale.
The results showed that Gemini and RAG had the highest accuracy scores (median: 4.0), followed by Generative Pre-trained Transformer 4 (median: 2.5) and Llama (median: 1.5), respectively. When the accuracy scores were examined between the two chatbots, a significant difference was detected between all groups except Gemini and RAG.
These chatbots produced both completely correct answers and false information with potentially harmful clinical outcomes. Customization of LLMs in specific areas such as nutrition or the development of a nutrition-specific RAG framework by improving LLM structures with current guidelines and articles may be an important strategy to increase the accuracy of AI powered chatbots.
与许多领域一样,大语言模型(LLMs)已成为强大的工具,在营养与健康领域快速获取信息方面具有巨大潜力。检索增强生成(RAG)已被纳入人工智能(AI)驱动的聊天机器人结构中,作为一种为提高大语言模型的准确性和能力而开发的框架。本研究旨在评估大语言模型(生成式预训练变换器4、Gemini和Llama)和RAG在确定慢性肾脏病饮食原则方面的准确性。
2020年美国国家肾脏基金会发布的营养指南被用作已开发的RAG模型的外部信息源。通过四个聊天机器人,使用12个针对慢性肾脏病的医学营养治疗提示获得答案。聊天机器人生成的48个答案的准确性用5级李克特量表进行评估。
结果显示,Gemini和RAG的准确性得分最高(中位数:4.0),其次是生成式预训练变换器4(中位数:2.5)和Llama(中位数:1.5)。在检查两个聊天机器人之间的准确性得分时,除Gemini和RAG外,所有组之间均检测到显著差异。
这些聊天机器人既产生了完全正确的答案,也产生了可能具有有害临床后果的错误信息。在营养等特定领域对大语言模型进行定制,或者通过利用当前指南和文章改进大语言模型结构来开发特定于营养的RAG框架,可能是提高人工智能驱动的聊天机器人准确性的重要策略。