Suppr
超能文献

用于评估和增强全基因组关联研究结果的无监督文本挖掘

Unsupervised text mining for assessing and augmenting GWAS results.

作者信息

Ailem Melissa, Role François, Nadif Mohamed, Demenais Florence

机构信息

LIPADE, Université Paris Descartes, Sorbonne Paris Cité, Paris F-75006, France.

INSERM, Genetic Variation and Human Diseases Unit, UMR-946, Paris F-75010, France; Institut Universitaire d'Hématologie, Université Paris Diderot, Sorbonne Paris Cité, Paris F-75010, France.

出版信息

J Biomed Inform. 2016 Apr;60:252-9. doi: 10.1016/j.jbi.2016.02.008. Epub 2016 Feb 19.

DOI:10.1016/j.jbi.2016.02.008

PMID:26911523

Abstract

Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma.

摘要

文本挖掘有助于对大规模生物医学数据进行分析和解读，帮助生物学家快速且低成本地确认生物实体之间假设关系的真实性。我们将这个问题置于全基因组关联研究（GWAS）的背景下，这是一个正在积极兴起的领域，它有助于识别许多与多因素疾病相关的基因。这些研究能够识别与同一表型相关的基因群组，但并未提供这些基因之间关系的信息。因此，我们的目标是利用无监督文本挖掘技术，通过基于文本的余弦相似度比较和应用于候选基因和随机基因向量的聚类，来增强GWAS的结果。我们提出了一个通用框架，并用它来表征先前一项GWAS报告的与哮喘相关的10个基因之间的关系。该实验结果表明，这10个基因之间的相似性显著强于随机预期（单侧p值<0.01）。对观察到的基因和随机选择的基因进行聚类，也能够生成关于这些基因之间潜在功能关系的假设，从而有助于发现哮喘的新候选基因。

相似文献

Unsupervised text mining for assessing and augmenting GWAS results.

J Biomed Inform. 2016 Apr;60:252-9. doi: 10.1016/j.jbi.2016.02.008. Epub 2016 Feb 19.

Bridging heterogeneous mutation data to enhance disease gene discovery.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab079.

GWAS Integrator: a bioinformatics tool to explore human genetic associations reported in published genome-wide association studies.

Eur J Hum Genet. 2011 Oct;19(10):1095-9. doi: 10.1038/ejhg.2011.91. Epub 2011 May 25.

Functional genomics of candidate genes derived from genome-wide association studies for five common neurological diseases.

Int J Neurosci. 2017 Feb;127(2):118-123. doi: 10.3109/00207454.2016.1149172. Epub 2016 Feb 17.

Mining Plant Genomic and Genetic Data Using the GnpIS Information System.

Methods Mol Biol. 2017;1533:103-117. doi: 10.1007/978-1-4939-6658-5_5.

Text mining biomedical literature for constructing gene regulatory networks.

Interdiscip Sci. 2009 Sep;1(3):179-86. doi: 10.1007/s12539-009-0028-7. Epub 2009 Aug 7.

Network.assisted analysis to prioritize GWAS results: principles, methods and perspectives.

Hum Genet. 2014 Feb;133(2):125-38. doi: 10.1007/s00439-013-1377-1.

Unsupervised discovery of information structure in biomedical documents.

Bioinformatics. 2015 Apr 1;31(7):1084-92. doi: 10.1093/bioinformatics/btu758. Epub 2014 Nov 18.

Text mining in livestock animal science: introducing the potential of text mining to animal sciences.

J Anim Sci. 2012 Oct;90(10):3666-76. doi: 10.2527/jas.2011-4841. Epub 2012 Jun 4.

Airway Epithelial Expression Quantitative Trait Loci Reveal Genes Underlying Asthma and Other Airway Diseases.

Am J Respir Cell Mol Biol. 2016 Feb;54(2):177-87. doi: 10.1165/rcmb.2014-0381OC.

引用本文的文献

Semantically defined subdomains of functional neuroimaging literature and their corresponding brain regions.

Hum Brain Mapp. 2018 Jul;39(7):2764-2776. doi: 10.1002/hbm.24038. Epub 2018 Mar 25.

The research on gene-disease association based on text-mining of PubMed.

BMC Bioinformatics. 2018 Feb 7;19(1):37. doi: 10.1186/s12859-018-2048-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

用于评估和增强全基因组关联研究结果的无监督文本挖掘

Unsupervised text mining for assessing and augmenting GWAS results.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译