Emily Mathieu
Stat Appl Genet Mol Biol. 2016 Apr;15(2):151-71. doi: 10.1515/sagmb-2015-0074.
Among the large of number of statistical methods that have been proposed to identify gene-gene interactions in case-control genome-wide association studies (GWAS), gene-based methods have recently grown in popularity as they confer advantage in both statistical power and biological interpretation. All of the gene-based methods jointly model the distribution of single nucleotide polymorphisms (SNPs) sets prior to the statistical test, leading to a limited power to detect sums of SNP-SNP signals. In this paper, we instead propose a gene-based method that first performs SNP-SNP interaction tests before aggregating the obtained p-values into a test at the gene level. Our method called AGGrEGATOr is based on a minP procedure that tests the significance of the minimum of a set of p-values. We use simulations to assess the capacity of AGGrEGATOr to correctly control for type-I error. The benefits of our approach in terms of statistical power and robustness to SNPs set characteristics are evaluated in a wide range of disease models by comparing it to previous methods. We also apply our method to detect gene pairs associated to rheumatoid arthritis (RA) on the GSE39428 dataset. We identify 13 potential gene-gene interactions and replicate one gene pair in the Wellcome Trust Case Control Consortium dataset at the level of 5%. We further test 15 gene pairs, previously reported as being statistically associated with RA or Crohn's disease (CD) or coronary artery disease (CAD), for replication in the Wellcome Trust Case Control Consortium dataset. We show that AGGrEGATOr is the only method able to successfully replicate seven gene pairs.
在众多为在病例对照全基因组关联研究(GWAS)中识别基因-基因相互作用而提出的统计方法中,基于基因的方法近来越来越受欢迎,因为它们在统计功效和生物学解释方面都具有优势。所有基于基因的方法在统计检验之前都会对单核苷酸多态性(SNP)集的分布进行联合建模,这导致检测SNP-SNP信号总和的能力有限。在本文中,我们提出了一种基于基因的方法,该方法首先进行SNP-SNP相互作用检验,然后将获得的p值汇总为基因水平的检验。我们的方法称为AGGrEGATOr,它基于一种minP程序,该程序用于检验一组p值中的最小值的显著性。我们使用模拟来评估AGGrEGATOr正确控制I型错误的能力。通过将我们的方法与以前的方法进行比较,在广泛的疾病模型中评估了我们的方法在统计功效和对SNP集特征的稳健性方面的优势。我们还将我们的方法应用于在GSE39428数据集上检测与类风湿性关节炎(RA)相关的基因对。我们识别出13个潜在的基因-基因相互作用,并在威康信托病例对照协会数据集以5%的水平重复了一对基因。我们进一步在威康信托病例对照协会数据集上测试了先前报道与RA或克罗恩病(CD)或冠状动脉疾病(CAD)有统计学关联的15对基因,以进行重复验证。我们表明AGGrEGATOr是唯一能够成功重复验证7对基因的方法。