Padhukasahasram Badri, Reddy Chandan K, Levin Albert M, Burchard Esteban G, Williams L Keoki
Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, Michigan, United States of America.
Department of Computer Science, Wayne State University, Detroit, Michigan, United States of America.
PLoS One. 2015 Nov 30;10(11):e0143489. doi: 10.1371/journal.pone.0143489. eCollection 2015.
Multi-marker approaches have received a lot of attention recently in genome wide association studies and can enhance power to detect new associations under certain conditions. Gene-, gene-set- and pathway-based association tests are increasingly being viewed as useful supplements to the more widely used single marker association analysis which have successfully uncovered numerous disease variants. A major drawback of single-marker based methods is that they do not look at the joint effects of multiple genetic variants which individually may have weak or moderate signals. Here, we describe novel tests for multi-marker association analyses that are based on phenotype predictions obtained from machine learning algorithms. Instead of assuming a linear or logistic regression model, we propose the use of ensembles of diverse machine learning algorithms for prediction. We show that phenotype predictions obtained from ensemble learning algorithms provide a new framework for multi-marker association analysis. They can be used for constructing tests for the joint association of multiple variants, adjusting for covariates and testing for the presence of interactions. To demonstrate the power and utility of this new approach, we first apply our method to simulated SNP datasets. We show that the proposed method has the correct Type-1 error rates and can be considerably more powerful than alternative approaches in some situations. Then, we apply our method to previously studied asthma-related genes in 2 independent asthma cohorts to conduct association tests.
多标记方法最近在全基因组关联研究中受到了广泛关注,并且在某些条件下可以增强检测新关联的能力。基于基因、基因集和通路的关联测试越来越被视为对更广泛使用的单标记关联分析的有用补充,单标记关联分析已成功发现了众多疾病变异。基于单标记的方法的一个主要缺点是它们没有考虑多个遗传变异的联合效应,这些变异个体可能具有微弱或中等的信号。在这里,我们描述了基于从机器学习算法获得的表型预测的多标记关联分析的新测试。我们不是假设线性或逻辑回归模型,而是建议使用多种机器学习算法的集成进行预测。我们表明,从集成学习算法获得的表型预测为多标记关联分析提供了一个新框架。它们可用于构建多个变异联合关联的测试、调整协变量以及测试相互作用的存在。为了证明这种新方法的能力和实用性,我们首先将我们的方法应用于模拟的SNP数据集。我们表明,所提出的方法具有正确的I型错误率,并且在某些情况下比替代方法更强大。然后,我们将我们的方法应用于两个独立哮喘队列中先前研究过的与哮喘相关的基因,以进行关联测试。