Xiong Guangzhi, Jin Qiao, Wang Xiao, Zhang Minjia, Lu Zhiyong, Zhang Aidong
Department of Computer Science, University of Virginia, VA 22904, USA,
National Library of Medicine, National Institutes of Health, MD 20892, USA,
Pac Symp Biocomput. 2025;30:199-214. doi: 10.1142/9789819807024_0015.
The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may still fail in complex cases where multiple rounds of information-seeking are required. To address such an issue, we propose iterative RAG for medicine (i-MedRAG), where LLMs can iteratively ask follow-up queries based on previous information-seeking attempts. In each iteration of i-MedRAG, the follow-up queries will be answered by a vanilla RAG system and they will be further used to guide the query generation in the next iteration. Our experiments show the improved performance of various LLMs brought by i-MedRAG compared with vanilla RAG on complex questions from clinical vignettes in the United States Medical Licensing Examination (USMLE), as well as various knowledge tests in the Massive Multitask Language Understanding (MMLU) dataset. Notably, our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5, achieving an accuracy of 69.68% on the MedQA dataset. In addition, we characterize the scaling properties of i-MedRAG with different iterations of follow-up queries and different numbers of queries per iteration. Our case studies show that i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions. To the best of our knowledge, this is the first-of-its-kind study on incorporating follow-up queries into medical RAG.
大语言模型(LLMs)的新兴能力在解决医学问题方面展现出了巨大潜力。它们能够拥有相当丰富的医学知识,但仍可能产生幻觉,并且在知识更新方面不够灵活。虽然提出了检索增强生成(RAG)来利用外部知识库增强LLMs的医学问答能力,但在需要多轮信息查询的复杂情况下仍可能失败。为了解决这一问题,我们提出了医学迭代RAG(i-MedRAG),其中LLMs可以根据先前的信息查询尝试迭代地提出后续问题。在i-MedRAG的每次迭代中,后续问题将由普通RAG系统回答,并将进一步用于指导下一次迭代中的问题生成。我们的实验表明,与普通RAG相比,i-MedRAG在来自美国医学执照考试(USMLE)临床病例的复杂问题以及大规模多任务语言理解(MMLU)数据集中的各种知识测试上,提升了各种LLMs的性能。值得注意的是,我们的零样本i-MedRAG在GPT-3.5上优于所有现有的提示工程和微调方法,在MedQA数据集上达到了69.68%的准确率。此外,我们刻画了i-MedRAG在不同后续问题迭代次数和每次迭代不同问题数量下的缩放特性。我们的案例研究表明,i-MedRAG可以灵活地提出后续问题以形成推理链,对医学问题进行深入分析。据我们所知,这是将后续问题纳入医学RAG的同类研究中的首次尝试。