Suppr超能文献

MAGPEL:从全文献中自动推断变异驱动的基因面板的自动化管道。

MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature.

机构信息

Department of Computer Science, Wayne State University, Detroit, MI, USA.

Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA.

出版信息

Sci Rep. 2020 Jul 23;10(1):12365. doi: 10.1038/s41598-020-68649-0.

Abstract

In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexity of natural language processing, (ii) inconsistent use of standard recommendations for variant description, and (iii) the lack of clarity and consistency in describing the variant-genotype-phenotype associations in the biomedical literature. In this article, we employ text mining and word cloud analysis techniques to address these challenges. The proposed framework extracts the variant-gene-disease associations from the full-length biomedical literature and designs an evidence-based variant-driven gene panel for a given condition. We validate the identified genes by showing their diagnostic abilities to predict the patients' clinical outcome on several independent validation cohorts. As representative examples, we present our results for acute myeloid leukemia (AML), breast cancer and prostate cancer. We compare these panels with other variant-driven gene panels obtained from Clinvar, Mastermind and others from literature, as well as with a panel identified with a classical differentially expressed genes (DEGs) approach. The results show that the panels obtained by the proposed framework yield better results than the other gene panels currently available in the literature.

摘要

尽管在开发和维护准确的变异数据库方面做出了努力,但仍有大量与疾病相关的变异隐藏在生物医学文献中。由于以下原因,对生物医学文献进行编目以提取这些信息是一项具有挑战性的任务:(i) 自然语言处理的复杂性,(ii) 对变异描述的标准建议使用不一致,以及 (iii) 描述生物医学文献中变异-基因型-表型关联的清晰度和一致性不足。在本文中,我们采用文本挖掘和词云分析技术来解决这些挑战。所提出的框架从全文生物医学文献中提取变异-基因-疾病关联,并为给定条件设计基于证据的变异驱动基因面板。我们通过展示这些基因在几个独立验证队列中预测患者临床结果的诊断能力来验证所鉴定的基因。作为代表性示例,我们展示了我们在急性髓性白血病 (AML)、乳腺癌和前列腺癌方面的结果。我们将这些面板与从 Clinvar、Mastermind 和其他文献中获得的其他变异驱动基因面板以及使用经典差异表达基因 (DEGs) 方法获得的面板进行了比较。结果表明,与目前文献中可用的其他基因面板相比,所提出的框架获得的面板产生了更好的结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验