MAGPEL：从全文献中自动推断变异驱动的基因面板的自动化管道。

MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature.

机构信息

Department of Computer Science, Wayne State University, Detroit, MI, USA.

Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA.

出版信息

Sci Rep. 2020 Jul 23;10(1):12365. doi: 10.1038/s41598-020-68649-0.

DOI:10.1038/s41598-020-68649-0

PMID:32703994

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7378213/

Abstract

In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexity of natural language processing, (ii) inconsistent use of standard recommendations for variant description, and (iii) the lack of clarity and consistency in describing the variant-genotype-phenotype associations in the biomedical literature. In this article, we employ text mining and word cloud analysis techniques to address these challenges. The proposed framework extracts the variant-gene-disease associations from the full-length biomedical literature and designs an evidence-based variant-driven gene panel for a given condition. We validate the identified genes by showing their diagnostic abilities to predict the patients' clinical outcome on several independent validation cohorts. As representative examples, we present our results for acute myeloid leukemia (AML), breast cancer and prostate cancer. We compare these panels with other variant-driven gene panels obtained from Clinvar, Mastermind and others from literature, as well as with a panel identified with a classical differentially expressed genes (DEGs) approach. The results show that the panels obtained by the proposed framework yield better results than the other gene panels currently available in the literature.

摘要

尽管在开发和维护准确的变异数据库方面做出了努力，但仍有大量与疾病相关的变异隐藏在生物医学文献中。由于以下原因，对生物医学文献进行编目以提取这些信息是一项具有挑战性的任务：(i) 自然语言处理的复杂性，(ii) 对变异描述的标准建议使用不一致，以及 (iii) 描述生物医学文献中变异-基因型-表型关联的清晰度和一致性不足。在本文中，我们采用文本挖掘和词云分析技术来解决这些挑战。所提出的框架从全文生物医学文献中提取变异-基因-疾病关联，并为给定条件设计基于证据的变异驱动基因面板。我们通过展示这些基因在几个独立验证队列中预测患者临床结果的诊断能力来验证所鉴定的基因。作为代表性示例，我们展示了我们在急性髓性白血病 (AML)、乳腺癌和前列腺癌方面的结果。我们将这些面板与从 Clinvar、Mastermind 和其他文献中获得的其他变异驱动基因面板以及使用经典差异表达基因 (DEGs) 方法获得的面板进行了比较。结果表明，与目前文献中可用的其他基因面板相比，所提出的框架获得的面板产生了更好的结果。

相似文献

MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature.MAGPEL：从全文献中自动推断变异驱动的基因面板的自动化管道。

Sci Rep. 2020 Jul 23;10(1):12365. doi: 10.1038/s41598-020-68649-0.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Interventions for promoting habitual exercise in people living with and beyond cancer.促进癌症患者及康复者进行习惯性锻炼的干预措施。

Cochrane Database Syst Rev. 2018 Sep 19;9(9):CD010192. doi: 10.1002/14651858.CD010192.pub3.

Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备：证据综合和成本效益分析。

Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟：一、入组、临床、液体方案。

Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.

Interventions for promoting habitual exercise in people living with and beyond cancer.促进癌症患者及康复者进行习惯性锻炼的干预措施。

Cochrane Database Syst Rev. 2013 Sep 24(9):CD010192. doi: 10.1002/14651858.CD010192.pub2.

Antioxidants for male subfertility.用于男性生育力低下的抗氧化剂。

Cochrane Database Syst Rev. 2014(12):CD007411. doi: 10.1002/14651858.CD007411.pub3. Epub 2014 Dec 15.

Short-Term Memory Impairment短期记忆障碍

Prognostic factors for return to work in breast cancer survivors.乳腺癌幸存者恢复工作的预后因素。

Cochrane Database Syst Rev. 2025 May 7;5(5):CD015124. doi: 10.1002/14651858.CD015124.pub2.

引用本文的文献

Cutting-Edge AI Technologies Meet Precision Medicine to Improve Cancer Care.前沿人工智能技术与精准医学相结合，改善癌症治疗。

Biomolecules. 2022 Aug 17;12(8):1133. doi: 10.3390/biom12081133.

Identifying and Validating Networks of Oncology Biomarkers Mined From the Scientific Literature.识别和验证从科学文献中挖掘出的肿瘤生物标志物网络。

Cancer Inform. 2022 Mar 22;21:11769351221086441. doi: 10.1177/11769351221086441. eCollection 2022.

Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery.利用文本挖掘共现特征为癌症基因panel发现情境化基因

Front Genet. 2021 Oct 25;12:771435. doi: 10.3389/fgene.2021.771435. eCollection 2021.

本文引用的文献

A comparative study of topology-based pathway enrichment analysis methods.基于拓扑的通路富集分析方法的比较研究。

BMC Bioinformatics. 2019 Nov 4;20(1):546. doi: 10.1186/s12859-019-3146-1.

Identifying significantly impacted pathways: a comprehensive review and assessment.识别受显著影响的途径：全面回顾与评估。

Genome Biol. 2019 Oct 9;20(1):203. doi: 10.1186/s13059-019-1790-4.

GSMA: an approach to identify robust global and test Gene Signatures using Meta-Analysis.GSMA：一种使用荟萃分析识别稳健的全局和测试基因特征的方法。

Bioinformatics. 2020 Jan 15;36(2):487-495. doi: 10.1093/bioinformatics/btz561.

Ensembl variation resources.Ensembl 变异资源。

Database (Oxford). 2018 Jan 1;2018:bay119. doi: 10.1093/database/bay119.

Network-Based Approaches for Pathway Level Analysis.基于网络的通路水平分析方法。

Curr Protoc Bioinformatics. 2018 Mar;61(1):8.25.1-8.25.24. doi: 10.1002/cpbi.42.

A critical comparison of topology-based pathway analysis methods.基于拓扑结构的通路分析方法的关键比较。

PLoS One. 2018 Jan 25;13(1):e0191154. doi: 10.1371/journal.pone.0191154. eCollection 2018.

tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.tmVar 2.0：整合文献中的基因组变异信息与 dbSNP 和 ClinVar，以用于精准医学。

Bioinformatics. 2018 Jan 1;34(1):80-87. doi: 10.1093/bioinformatics/btx541.

An approach to infer putative disease-specific mechanisms using neighboring gene networks.利用邻近基因网络推断假定疾病特异性机制的方法。

Bioinformatics. 2017 Jul 1;33(13):1987-1994. doi: 10.1093/bioinformatics/btx097.

Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.从生物医学文献中挖掘基因型-表型关系以用于数据库管理和精准医学。

PLoS Comput Biol. 2016 Nov 30;12(11):e1005017. doi: 10.1371/journal.pcbi.1005017. eCollection 2016 Nov.

TP53 mutations in newly diagnosed acute myeloid leukemia: Clinicomolecular characteristics, response to therapy, and outcomes.新诊断急性髓系白血病中的TP53突变：临床分子特征、对治疗的反应及预后

Cancer. 2016 Nov 15;122(22):3484-3491. doi: 10.1002/cncr.30203. Epub 2016 Jul 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MAGPEL：从全文献中自动推断变异驱动的基因面板的自动化管道。

MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献