Department of Computer Science, Wayne State University, Detroit, MI, USA.
Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA.
Sci Rep. 2020 Jul 23;10(1):12365. doi: 10.1038/s41598-020-68649-0.
In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexity of natural language processing, (ii) inconsistent use of standard recommendations for variant description, and (iii) the lack of clarity and consistency in describing the variant-genotype-phenotype associations in the biomedical literature. In this article, we employ text mining and word cloud analysis techniques to address these challenges. The proposed framework extracts the variant-gene-disease associations from the full-length biomedical literature and designs an evidence-based variant-driven gene panel for a given condition. We validate the identified genes by showing their diagnostic abilities to predict the patients' clinical outcome on several independent validation cohorts. As representative examples, we present our results for acute myeloid leukemia (AML), breast cancer and prostate cancer. We compare these panels with other variant-driven gene panels obtained from Clinvar, Mastermind and others from literature, as well as with a panel identified with a classical differentially expressed genes (DEGs) approach. The results show that the panels obtained by the proposed framework yield better results than the other gene panels currently available in the literature.
尽管在开发和维护准确的变异数据库方面做出了努力,但仍有大量与疾病相关的变异隐藏在生物医学文献中。由于以下原因,对生物医学文献进行编目以提取这些信息是一项具有挑战性的任务:(i) 自然语言处理的复杂性,(ii) 对变异描述的标准建议使用不一致,以及 (iii) 描述生物医学文献中变异-基因型-表型关联的清晰度和一致性不足。在本文中,我们采用文本挖掘和词云分析技术来解决这些挑战。所提出的框架从全文生物医学文献中提取变异-基因-疾病关联,并为给定条件设计基于证据的变异驱动基因面板。我们通过展示这些基因在几个独立验证队列中预测患者临床结果的诊断能力来验证所鉴定的基因。作为代表性示例,我们展示了我们在急性髓性白血病 (AML)、乳腺癌和前列腺癌方面的结果。我们将这些面板与从 Clinvar、Mastermind 和其他文献中获得的其他变异驱动基因面板以及使用经典差异表达基因 (DEGs) 方法获得的面板进行了比较。结果表明,与目前文献中可用的其他基因面板相比,所提出的框架获得的面板产生了更好的结果。