Du Xinsong, Nagy Anna, Oates Michael F, Wang Yifei, Wang Xinyi, Plasek Joseph M, Aronson Samuel J, Lebo Matthew S, Zhou Li
Department of Medicine, Brigham and Women's Hospital and Harvard Medical School.
Department of Biomedical Informatics, Harvard Medical School.
medRxiv. 2025 Jun 10:2025.06.09.25329279. doi: 10.1101/2025.06.09.25329279.
Accurate interpretation of genetic variants is critical for precision medicine. While large language models (LLMs) show promise for summarization, they are prone to hallucinations. In this study, we thus propose a novel approach named "precision grounding" that augments LLMs with a query tool that integrated evidence-based, variant-specific information to improve summarization accuracy.
Unlike traditional RAG methods that retrieve information via document embeddings from a vector database, precision grounding uses a domain-specific query tool to access evidence-based databases with unique identifiers. For variant summarization, we developed CATT (https://shorturl.at/pw81X), an open-source tool integrating ClinGen, ClinVar, and GenCC data. Users can query and retrieve curated evidence via Variation IDs to ground LLM outputs. We compared our approach to web grounding-based RAG using 50 expert-selected variants.
GPT-4o was selected due to its good performance on our task during a pilot test. Using GPT-4o, we found our precision grounding approach outperformed web-search grounding, achieving significantly higher accuracy and completeness scores, which were based on a 5-point Likert-Scale of 4.76 (+0.74) and 4.94 (+0.84), respectively. Error analysis revealed that precision grounding reduced clinically significant hallucinations, such as incorrect pathogenicity classification and summarizing the wrong variant.
Precision grounding approach outperformed web-search grounding for genetic variant summarization. Our open-source tool, CATT, enables integration of curated, domain-specific knowledge and reduces hallucinations in LLM outputs.
基因变异的准确解读对精准医学至关重要。虽然大语言模型(LLMs)在摘要生成方面显示出潜力,但它们容易产生幻觉。因此,在本研究中,我们提出了一种名为“精准锚定”的新方法,该方法通过一个集成了基于证据的、特定变异信息的查询工具增强大语言模型,以提高摘要生成的准确性。
与传统的基于检索增强生成(RAG)的方法不同,传统方法通过文档嵌入从向量数据库中检索信息,精准锚定使用特定领域的查询工具通过唯一标识符访问基于证据的数据库。对于变异摘要生成,我们开发了CATT(https://shorturl.at/pw81X),这是一个集成了临床基因组资源(ClinGen)、临床变异数据库(ClinVar)和基因变异临床分类(GenCC)数据的开源工具。用户可以通过变异ID查询和检索经过整理的证据,以锚定大语言模型的输出。我们使用50个专家选择的变异将我们的方法与基于网络锚定的RAG进行了比较。
由于GPT-4o在初步测试中在我们的任务上表现良好,因此被选中。使用GPT-4o,我们发现我们的精准锚定方法优于网络搜索锚定,分别基于4.76(+0.74)和4.94(+0.84)的5点李克特量表,实现了显著更高的准确性和完整性分数。错误分析表明,精准锚定减少了临床上显著的幻觉,如致病性分类错误和总结错误的变异。
在基因变异摘要生成方面,精准锚定方法优于网络搜索锚定。我们的开源工具CATT能够整合经过整理的、特定领域的知识,并减少大语言模型输出中的幻觉。