• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AutoPM3:通过由大语言模型驱动从科学文献中提取PM3证据来增强变异解读

AutoPM3: enhancing variant interpretation via LLM-driven PM3 evidence extraction from scientific literature.

作者信息

Li Shumin, Wang Yiding, Liu Chi-Man, Huang Yuanhua, Lam Tak-Wah, Luo Ruibang

机构信息

Department of Computer Science, School of Computing and Data Science, University of Hong Kong, Hong Kong, 999077, China.

School of Biomedical Sciences, University of Hong Kong, Hong Kong, 999077, China.

出版信息

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf382.

DOI:10.1093/bioinformatics/btaf382
PMID:40586923
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12263107/
Abstract

MOTIVATION

Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has become cost-effective, interpreting these variants-particularly collecting literature-based evidence like ACMG/AMP PM3-remains complex and time-consuming.

RESULTS

We present AutoPM3, a method that automates PM3 evidence extraction from literatures using open-source large language models (LLMs). AutoPM3 combines a Text2SQL-based variant extractor and a retrieval-augmented generation (RAG) module, enhanced by a variant-specific retriever and fine-tuned LLM, to separately process tables and text. We curated PM3-Bench, a dataset of 1027 variant-publication evidence pairs from ClinGen. On openly accessible pairs, AutoPM3 achieved 86.1% accuracy for variant hits and 72.5% recall for in trans variants-outperforming other methods, including those using larger models. We uncovered the effectiveness of AutoPM3's key modules, especially for variant-specific retriever and Text2SQL, through the sequential ablation study. AutoPM3 located evidence in 76 s, demonstrating that open-source LLMs can offer an efficient, cost-effective solution for rare disease diagnosis.

AVAILABILITY AND IMPLEMENTATION

AutoPM3 is implemented and freely available under the MIT license at https://github.com/HKU-BAL/AutoPM3.

摘要

动机

罕见病影响着全球超过3亿人,通常由基因变异引起。虽然变异检测已变得具有成本效益,但解读这些变异——尤其是收集像美国医学遗传学与基因组学学会/美国病理学家协会(ACMG/AMP)PM3这样基于文献的证据——仍然复杂且耗时。

结果

我们提出了AutoPM3,这是一种使用开源大语言模型(LLM)从文献中自动提取PM3证据的方法。AutoPM3结合了基于文本到SQL的变异提取器和检索增强生成(RAG)模块,并通过变异特异性检索器和微调的LLM进行增强,以分别处理表格和文本。我们整理了PM3-Bench,这是一个来自临床基因组资源(ClinGen)的包含1027个变异-文献证据对的数据集。在公开可获取的对上,AutoPM3在变异命中方面的准确率达到86.1%,在反式变异方面的召回率达到72.5%,优于其他方法,包括那些使用更大模型的方法。通过顺序消融研究,我们发现了AutoPM3关键模块的有效性,特别是变异特异性检索器和文本到SQL模块。AutoPM3在76秒内找到了证据,表明开源LLM可以为罕见病诊断提供高效、经济有效的解决方案。

可用性和实现方式

AutoPM3已实现,并根据麻省理工学院许可在https://github.com/HKU-BAL/AutoPM3上免费提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/15fdfb70b429/btaf382f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/e86583a8a325/btaf382f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/3a318a452ec3/btaf382f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/6c4cec62cd36/btaf382f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/ce497191e3a4/btaf382f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/bc869f36bc93/btaf382f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/15fdfb70b429/btaf382f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/e86583a8a325/btaf382f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/3a318a452ec3/btaf382f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/6c4cec62cd36/btaf382f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/ce497191e3a4/btaf382f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/bc869f36bc93/btaf382f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b704/12263107/15fdfb70b429/btaf382f6.jpg

相似文献

1
AutoPM3: enhancing variant interpretation via LLM-driven PM3 evidence extraction from scientific literature.AutoPM3:通过由大语言模型驱动从科学文献中提取PM3证据来增强变异解读
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf382.
2
Easy-PSAP: An Integrated Workflow to Prioritize Pathogenic Variants in Sequence Data from a Single Individual.简易PSAP:一种对来自单个个体的序列数据中的致病变异进行优先级排序的综合工作流程。
Hum Hered. 2025;90(1):33-40. doi: 10.1159/000543671. Epub 2025 Jun 10.
3
Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset.评估和增强用于遗传咨询支持的日本大语言模型:领域适应的比较研究与专家评估数据集的开发
JMIR Med Inform. 2025 Jan 16;13:e65047. doi: 10.2196/65047.
4
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
5
SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.SAKit:一种用于鉴定由大尺度和小尺度变异事件产生的新型蛋白质的一体化分析管道。
J Bioinform Comput Biol. 2024 Oct;22(5):2450022. doi: 10.1142/S0219720024500227. Epub 2024 Oct 1.
6
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
7
Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.首次就诊时磁共振灌注成像用于鉴别低级别与高级别胶质瘤
Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2.
8
Short-Term Memory Impairment短期记忆障碍
9
Cloud-based introduction to BASH programming for biologists.基于云的生物学 BASH 编程入门。
Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae244.
10
Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.利用大语言模型检测医院获得性疾病:关于肺栓塞的实证研究
J Am Med Inform Assoc. 2025 May 1;32(5):876-884. doi: 10.1093/jamia/ocaf048.

本文引用的文献

1
A critical assessment of using ChatGPT for extracting structured data from clinical notes.对使用ChatGPT从临床记录中提取结构化数据的批判性评估。
NPJ Digit Med. 2024 May 1;7(1):106. doi: 10.1038/s41746-024-01079-8.
2
PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.PubTator 3.0:一款人工智能驱动的文献资源,用于解锁生物医学知识。
Nucleic Acids Res. 2024 Jul 5;52(W1):W540-W546. doi: 10.1093/nar/gkae235.
3
Rare diseases: challenges and opportunities for research and public health.罕见病:研究与公共卫生面临的挑战与机遇
Nat Rev Dis Primers. 2024 Feb 29;10(1):13. doi: 10.1038/s41572-024-00505-1.
4
Extracting accurate materials data from research papers with conversational language models and prompt engineering.利用对话式语言模型和提示工程从研究论文中提取准确的材料数据。
Nat Commun. 2024 Feb 21;15(1):1569. doi: 10.1038/s41467-024-45914-8.
5
Structured information extraction from scientific text with large language models.利用大语言模型从科学文本中提取结构化信息。
Nat Commun. 2024 Feb 15;15(1):1418. doi: 10.1038/s41467-024-45563-x.
6
Tracking genetic variants in the biomedical literature using LitVar 2.0.使用LitVar 2.0在生物医学文献中追踪基因变异。
Nat Genet. 2023 Jun;55(6):901-903. doi: 10.1038/s41588-023-01414-x.
7
Mutalyzer 2: next generation HGVS nomenclature checker.Mutalyzer 2:下一代 HGVS 命名法检查器。
Bioinformatics. 2021 Sep 29;37(18):2811-2817. doi: 10.1093/bioinformatics/btab051.
8
The mutational constraint spectrum quantified from variation in 141,456 humans.从 141456 名人类个体的变异中量化的突变约束谱。
Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.
9
VarSome: the human genomic variant search engine.VarSome:人类基因组变异搜索引擎。
Bioinformatics. 2019 Jun 1;35(11):1978-1980. doi: 10.1093/bioinformatics/bty897.
10
LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC.LitVar:一个语义搜索引擎,用于在 PubMed 和 PMC 中链接基因组变异数据。
Nucleic Acids Res. 2018 Jul 2;46(W1):W530-W536. doi: 10.1093/nar/gky355.