Oki Noffisat O, Motsinger-Reif Alison A
Bioinformatics Research Center, North Carolina State University Raleigh, NC, USA.
Front Genet. 2011 Nov 21;2:80. doi: 10.3389/fgene.2011.00080. eCollection 2011.
Advances in genotyping technology and the multitude of genetic data available now provide a vast amount of data that is proving to be useful in the quest for a better understanding of human genetic diseases through the study of genetic variation. This has led to the development of approaches such as genome wide association studies (GWAS) designed specifically for interrogating variants across the genome for association with disease, typically by testing single locus, univariate associations. More recently it has been accepted that epistatic (interaction) effects may also be great contributors to these genetic effects, and GWAS methods are now being applied to find epistatic effects. The challenge for these methods still remain in prioritization and interpretation of results, as it has also become standard for initial findings to be independently investigated in replication cohorts or functional studies. This is motivating the development and implementation of filter-based approaches to prioritize variants found to be significant in a discovery stage for follow-up for replication. Such filters must be able to detect both univariate and interactive effects. In the current study we present and evaluate the use of multifactor dimensionality reduction (MDR) as such a filter, with simulated data and a wide range of effect sizes. Additionally, we compare the performance of the MDR filter to a similar filter approach using logistic regression (LR), the more traditional approach used in GWAS analysis, as well as evaporative cooling (EC)-another prominent machine learning filtering method. The results of our simulation study show that MDR is an effective method for such prioritization, and that it can detect main effects, and interactions with or without marginal effects. Importantly, it performed as well as EC and LR for main effect models. It also significantly outperforms LR for various two-locus epistatic models, while it has equivalent results as EC for the epistatic models. The results of this study demonstrate the potential of MDR as a filter to detect gene-gene interactions in GWAS studies.
基因分型技术的进步以及现有的大量遗传数据,提供了海量数据,事实证明,这些数据有助于通过研究基因变异更好地理解人类遗传疾病。这促使了诸如全基因组关联研究(GWAS)等方法的发展,这些方法专门用于检测全基因组范围内的变异与疾病的关联,通常通过测试单基因座、单变量关联来实现。最近,人们已经认识到上位性(相互作用)效应也可能是这些遗传效应的重要贡献因素,现在正在应用GWAS方法来寻找上位性效应。这些方法面临的挑战仍然在于结果的优先级排序和解释,因为在复制队列或功能研究中对初步发现进行独立调查也已成为标准做法。这推动了基于筛选的方法的开发和实施,以便对在发现阶段发现的显著变异进行优先级排序,以便后续进行复制研究。这样的筛选器必须能够检测单变量和交互效应。在当前的研究中,我们展示并评估了使用多因素降维(MDR)作为这样一种筛选器,使用模拟数据和广泛的效应大小。此外,我们将MDR筛选器的性能与使用逻辑回归(LR)的类似筛选方法、GWAS分析中使用的更传统方法以及蒸发冷却(EC)——另一种著名的机器学习筛选方法进行了比较。我们的模拟研究结果表明,MDR是一种有效的优先级排序方法,它可以检测主效应以及有无边际效应的交互作用。重要的是,在主效应模型方面,它与EC和LR的表现相当。在各种两位点上位性模型中,它也显著优于LR,而在上位性模型方面,它与EC的结果相当。这项研究的结果证明了MDR作为一种筛选器在GWAS研究中检测基因-基因相互作用的潜力。