Suppr超能文献

改善膳食补充剂信息检索:利用大语言模型开发检索增强生成系统

Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models.

作者信息

Hou Yu, Bishop Jeffrey R, Liu Hongfang, Zhang Rui

机构信息

Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN, United States.

Department of Experimental and Clinical Pharmacology, University of Minnesota, Minneapolis, MN, United States.

出版信息

J Med Internet Res. 2025 Mar 19;27:e67677. doi: 10.2196/67677.

Abstract

BACKGROUND

Dietary supplements (DSs) are widely used to improve health and nutrition, but challenges related to misinformation, safety, and efficacy persist due to less stringent regulations compared with pharmaceuticals. Accurate and reliable DS information is critical for both consumers and health care providers to make informed decisions.

OBJECTIVE

This study aimed to enhance DS-related question answering by integrating an advanced retrieval-augmented generation (RAG) system with the integrated Dietary Supplement Knowledgebase 2.0 (iDISK2.0), a dietary supplement knowledge base, to improve accuracy and reliability.

METHODS

We developed iDISK2.0 by integrating updated data from authoritative sources, including the Natural Medicines Comprehensive Database, the Memorial Sloan Kettering Cancer Center database, Dietary Supplement Label Database, and Licensed Natural Health Products Database, and applied advanced data cleaning and standardization techniques to reduce noise. The RAG system combined the retrieval power of a biomedical knowledge graph with the generative capabilities of large language models (LLMs) to address limitations of stand-alone LLMs, such as hallucination. The system retrieves contextually relevant subgraphs from iDISK2.0 based on user queries, enabling accurate and evidence-based responses through a user-friendly interface. We evaluated the system using true-or-false and multiple-choice questions derived from the Memorial Sloan Kettering Cancer Center database and compared its performance with stand-alone LLMs.

RESULTS

iDISK2.0 integrates 174,317 entities across 7 categories, including 8091 dietary supplement ingredients; 163,806 dietary supplement products; 786 diseases; and 625 drugs, along with 6 types of relationships. The RAG system achieved an accuracy of 99% (990/1000) for true-or-false questions on DS effectiveness and 95% (948/100) for multiple-choice questions on DS-drug interactions, substantially outperforming stand-alone LLMs like GPT-4o (OpenAI), which scored 62% (618/1000) and 52% (517/1000) on these respective tasks. The user interface enabled efficient interaction, supporting free-form text input and providing accurate responses. Integration strategies minimized data noise, ensuring access to up-to-date, DS-related information.

CONCLUSIONS

By integrating a robust knowledge graph with RAG and LLM technologies, iDISK2.0 addresses the critical limitations of stand-alone LLMs in DS information retrieval. This study highlights the importance of combining structured data with advanced artificial intelligence methods to improve accuracy and reduce misinformation in health care applications. Future work includes extending the framework to broader biomedical domains and improving evaluation with real-world, open-ended queries.

摘要

背景

膳食补充剂(DSs)被广泛用于改善健康和营养状况,但与药品相比,由于监管不够严格,与错误信息、安全性和有效性相关的挑战依然存在。准确可靠的DS信息对于消费者和医疗保健提供者做出明智决策至关重要。

目的

本研究旨在通过将先进的检索增强生成(RAG)系统与综合膳食补充剂知识库2.0(iDISK2.0)相结合,以提高与DS相关的问答准确性和可靠性,iDISK2.0是一个膳食补充剂知识库。

方法

我们通过整合来自权威来源的更新数据来开发iDISK2.0,这些来源包括天然药物综合数据库、纪念斯隆凯特琳癌症中心数据库、膳食补充剂标签数据库和天然健康产品许可数据库,并应用先进的数据清理和标准化技术来减少噪声。RAG系统将生物医学知识图谱的检索能力与大语言模型(LLMs)的生成能力相结合,以解决独立LLMs的局限性,如幻觉。该系统根据用户查询从iDISK2.0中检索上下文相关的子图,通过用户友好的界面实现准确且基于证据的回答。我们使用从纪念斯隆凯特琳癌症中心数据库中得出的是非题和选择题来评估该系统,并将其性能与独立的LLMs进行比较。

结果

iDISK2.0整合了7个类别的174,317个实体,包括8091种膳食补充剂成分;163,806种膳食补充剂产品;786种疾病;以及625种药物,还有6种类型的关系。RAG系统在关于DS有效性的是非题上准确率达到99%(990/1000),在关于DS-药物相互作用的选择题上准确率达到95%(948/100),大大超过了像GPT-4o(OpenAI)这样的独立LLMs,后者在这些相应任务上的得分分别为62%(618/1000)和52%(517/1000)。用户界面实现了高效交互,支持自由形式的文本输入并提供准确回答。整合策略最大限度地减少了数据噪声,确保能够获取最新的、与DS相关的信息。

结论

通过将强大的知识图谱与RAG和LLM技术相结合,iDISK2.0解决了独立LLMs在DS信息检索中的关键局限性。本研究强调了将结构化数据与先进的人工智能方法相结合以提高准确性并减少医疗保健应用中错误信息的重要性。未来的工作包括将该框架扩展到更广泛的生物医学领域,并通过实际的开放式查询改进评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d4/11966073/503221f1fb5f/jmir_v27i1e67677_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验