Wang Yaping, Li Donghui, Wei Peng
Department of Biostatistics, School of Public Health, University of Texas Health Science Center.
Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center.
Cancer Inform. 2015 Jun 4;14(Suppl 2):209-18. doi: 10.4137/CIN.S17305. eCollection 2015.
Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of GWAS-identified genetic loci only explains a small proportion of the disease heritability. This "missing heritability" problem has been partly attributed to the yet-to-be-identified gene-gene (G × G) and gene-environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey's one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey's 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey's 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn's disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case-control study of pancreatic cancer.
全基因组关联研究(GWAS)已经确定了数千个与包括癌症在内的数百种复杂人类疾病密切相关的单核苷酸多态性(SNP)。然而,大量通过GWAS确定的基因位点仅解释了疾病遗传力的一小部分。这种“遗传力缺失”问题部分归因于尚未确定的基因-基因(G×G)和基因-环境(G×E)相互作用。尽管G×G和G×E相互作用在理解疾病机制和填补遗传力缺失方面具有重要作用,但直接进行GWAS扫描以寻找此类相互作用的统计功效非常有限,成功案例很少。在此,我们提出一种两步统计方法来检验G×G/G×E相互作用:第一步是对基因区域内的多个SNP进行主成分分析(PCA),第二步是对主要主成分进行Tukey单自由度(1-df)检验。我们推导了一种得分检验,该检验对于所提出的Tukey 1-df相互作用检验在计算上快速且数值稳定。通过广泛的模拟,我们表明所提出的方法结合了两个简约模型,即PCA和Tukey 1-df形式的相互作用,优于其他现有方法。我们还通过使用威康信托病例对照研究联盟(WTCCC)GWAS数据检验克罗恩病的G×G相互作用以及使用胰腺癌病例对照研究数据检验G×E相互作用,证明了所提出方法的实用性和效率提升。