Suppr超能文献

通过模拟蒸发冷却网络分析在基因关联研究中捕捉相互作用效应谱。

Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis.

作者信息

McKinney Brett A, Crowe James E, Guo Jingyu, Tian Dehua

机构信息

Department of Genetics, University of Alabama School of Medicine, Birmingham, AL, USA.

出版信息

PLoS Genet. 2009 Mar;5(3):e1000432. doi: 10.1371/journal.pgen.1000432. Epub 2009 Mar 20.

Abstract

Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at (http://sites.google.com/site/McKinneyLab/software).

摘要

对多种疾病进行的人类遗传学研究证据表明,多个基因的等位基因之间的相互作用在影响表型表达方面起着重要作用。用于识别孟德尔疾病基因的分析方法应用于常见的多基因疾病时并不适用,因为此类方法每次仅研究一个基因座与表型的关联。需要新的策略来捕捉从孟德尔遗传到多因素上位性的遗传效应谱。随机森林(RF)和Relief-F是两种强大的机器学习方法,由于它们在对单个遗传变异与表型的相关性进行评分时能够考虑多个基因的等位基因背景,因此已被研究用作遗传病例对照数据的筛选方法。然而,当变异强烈相互作用时,RF在树节点分裂标准中的独立性假设会导致相关变异的重要性得分降低。另一方面,Relief-F旨在检测强相互作用,但对与表型分类无关的大量变异背景敏感,这在全基因组关联研究中是一个严重问题。为了克服这些数据挖掘方法的弱点,我们开发了蒸发冷却(EC)特征选择方法,这是一种灵活的机器学习方法,它可以整合多个重要性得分,同时去除不相关的遗传变异。为了表征详细的相互作用,我们构建了一个遗传关联相互作用网络(GAIN),其边量化了变异之间相对于表型的协同作用。我们通过模拟分析表明,EC能够识别遗传关联数据中的广泛相互作用效应。我们将EC筛选方法应用于一项关于单核苷酸多态性(SNP)的天花疫苗队列研究,并推断出与不良事件相关的一组SNP的GAIN。我们的结果表明枢纽在SNP疾病易感性网络中起着重要作用。该软件可在(http://sites.google.com/site/McKinneyLab/software)获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验