通过用于检索增强大语言模型的自适应迭代自查询检索来增强临床决策支持

Enhancing Clinical Decision Support with Adaptive Iterative Self-Query Retrieval for Retrieval-Augmented Large Language Models.

作者信息

Prabha Srinivasagam, Gomez-Cabello Cesar A, Haider Syed Ali, Genovese Ariana, Trabilsy Maissa, Wood Nadia G, Bagaria Sanjay, Tao Cui, Forte Antonio J

机构信息

Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL 32224, USA.

Department of Radiology AI IT, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA.

出版信息

Bioengineering (Basel). 2025 Aug 21;12(8):895. doi: 10.3390/bioengineering12080895.

DOI:10.3390/bioengineering12080895

PMID:40868407

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12383471/

Abstract

Retrieval-Augmented Generation (RAG) offers a promising strategy to harness large language models (LLMs) for delivering up-to-date, accurate clinical guidance while reducing physicians' cognitive burden, yet its effectiveness hinges on query clarity and structure. We propose an adaptive Self-Query Retrieval (SQR) framework that integrates three refinement modules-PICOT (Population, Intervention, Comparison, Outcome, Time), SPICE (Setting, Population, Intervention, Comparison, Evaluation), and Iterative Query Refinement (IQR)-to automatically restructure and iteratively enhance clinical questions until they meet predefined retrieval-quality thresholds. Implemented on Gemini-1.0 Pro, we benchmarked SQR using thirty postoperative rhinoplasty queries, evaluating responses for accuracy and relevance on a three-point Likert scale and for retrieval quality via precision, recall, and F1 score; statistical significance was assessed by one-way ANOVA with Tukey post-hoc testing. The full SQR pipeline achieved 87% accuracy (Likert 2.4 ± 0.7) and 100% relevance (Likert 3.0 ± 0.0), significantly outperforming a non-refined RAG baseline (50% accuracy, 80% relevance; < 0.01 and = 0.03). Precision, recall, and F1 rose from 0.17, 0.39 and 0.24 to 0.53, 1.00, and 0.70, respectively, while PICOT-only and SPICE-only variants yielded intermediate improvements. These findings demonstrate that automated structuring and iterative enhancement of queries via SQR substantially elevate LLM-based clinical decision support, and its model-agnostic architecture enables rapid adaptation across specialties, data sources, and LLM platforms.

摘要

检索增强生成（RAG）提供了一种很有前景的策略，可利用大语言模型（LLMs）提供最新、准确的临床指导，同时减轻医生的认知负担，但其有效性取决于查询的清晰度和结构。我们提出了一种自适应自查询检索（SQR）框架，该框架集成了三个优化模块——PICOT（人群、干预措施、对照、结局、时间）、SPICE（环境、人群、干预措施、对照、评估）和迭代查询优化（IQR）——以自动重组并迭代改进临床问题，直到它们达到预定义的检索质量阈值。在Gemini-1.0 Pro上实现后，我们使用30个隆鼻术后查询对SQR进行了基准测试，在三点李克特量表上评估回答的准确性和相关性，并通过精确率、召回率和F1分数评估检索质量；通过单因素方差分析和Tukey事后检验评估统计显著性。完整的SQR流程实现了87%的准确率（李克特量表评分为2.4±0.7）和100%的相关性（李克特量表评分为3.0±0.0），显著优于未优化的RAG基线（准确率50%，相关性80%；<0.01和=0.03）。精确率、召回率和F1分数分别从0.17、0.39和0.24提高到0.53、1.00和0.70，而仅使用PICOT和仅使用SPICE的变体则有中等程度的改进。这些发现表明，通过SQR对查询进行自动结构化和迭代增强可大幅提升基于大语言模型的临床决策支持，并且其与模型无关的架构能够在不同专业、数据源和大语言模型平台之间快速适配。