Wang Zhi, Maity Arnab, Luo Yiwen, Neely Megan L, Tzeng Jung-Ying
Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.
Genet Epidemiol. 2015 Feb;39(2):122-33. doi: 10.1002/gepi.21877. Epub 2014 Dec 23.
Studying complex diseases in the post genome-wide association studies (GWAS) era has led to developing methods that consider factor-sets rather than individual genetic/environmental factors (i.e., Multi-G-Multi-E studies), and mining for potential gene-environment (G×E) interactions has proven to be an invaluable aid in both discovery and deciphering underlying biological mechanisms. Current approaches for examining effect profiles in Multi-G-Multi-E analyses are either underpowered due to large degrees of freedom, ill-suited for detecting G×E interactions due to imprecise modeling of the G and E effects, or lack of capacity for modeling interactions between two factor-sets (e.g., existing methods focus primarily on a single E factor). In this work, we illustrate the issues encountered in constructing kernels for investigating interactions between two factor-sets, and propose a simple yet intuitive solution to construct the G×E kernel that retains the ease-of-interpretation of classic regression. We also construct a series of kernel machine (KM) score tests to evaluate the complete effect profile (i.e., the G, E, and G×E effects individually or in combination). We show, via simulations and a data application, that the proposed KM methods outperform the classic and PC regressions across a range of scenarios, including varying effect size, effect structure, and interaction complexity. The largest power gain was observed when the underlying effect structure involved complex G×E interactions; however, the proposed methods have consistent, powerful performance when the effect profile is simple or complex, suggesting that the proposed method could be a useful tool for exploratory or confirmatory G×E analysis.
在后全基因组关联研究(GWAS)时代,对复杂疾病的研究促使人们开发出考虑因素集而非单个遗传/环境因素的方法(即多基因-多环境研究),并且挖掘潜在的基因-环境(G×E)相互作用已被证明在发现和解读潜在生物学机制方面具有不可估量的帮助。目前在多基因-多环境分析中检查效应概况的方法,要么因自由度大而功效不足,要么由于对G和E效应的建模不精确而不适用于检测G×E相互作用,要么缺乏对两个因素集之间相互作用进行建模的能力(例如,现有方法主要关注单个E因素)。在这项工作中,我们阐述了在构建核以研究两个因素集之间的相互作用时遇到的问题,并提出了一种简单直观的解决方案来构建G×E核,该核保留了经典回归易于解释的特点。我们还构建了一系列核机器(KM)得分检验来评估完整的效应概况(即G、E和G×E效应单独或组合的情况)。我们通过模拟和数据应用表明,在一系列场景中,包括不同的效应大小、效应结构和相互作用复杂性,所提出的KM方法优于经典回归和主成分回归。当潜在效应结构涉及复杂的G×E相互作用时,观察到最大的功效增益;然而,当效应概况简单或复杂时,所提出的方法都具有一致且强大的性能,这表明所提出的方法可能是探索性或验证性G×E分析的有用工具。