Pecanka Jakub, Jonker Marianne A, Bochdanovits Zoltan, Van Der Vaart Aad W
Leiden University Medical Center, Department of Medical Statistics and Bioinformatics, Leiden, The Netherlands and VU University, Department of Mathematics, Amsterdam, the Netherlands.
VU University Medical Center, Department of Epidemiology and Biostatistics, Amsterdam, The Netherlands and Radboud University medical center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands.
Biostatistics. 2017 Jul 1;18(3):477-494. doi: 10.1093/biostatistics/kxw060.
For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the "missing heritability" of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson's disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.
十多年来,功能基因与基因间的相互作用(上位性)一直被怀疑是复杂性状“缺失遗传力”的一个决定因素。然而,在全基因组范围内寻找上位性具有挑战性,这是因为测试数量多得令人望而却步,会导致统计功效严重损失以及计算方面的难题。在本文中,我们提出了一种适用于现有病例对照数据集的两阶段方法,其目的是通过在实际测试一对候选基因座与复杂表型的相互作用之前,预先评估它们是否参与上位性,来减轻这两个问题。预先评估基于在病例样本中进行的双基因座基因型独立性测试。只有那些表现出非平衡频率的基因座对才通过逻辑回归得分检验进行分析,从而减轻多重检验负担。由于对于所有基因座对只进行计算简单的独立性测试,而要求更高的得分检验则仅限于最有希望的基因座对,因此全基因组上位性关联研究(GWAS)变得可行。通过设计,我们的方法能有力地控制I型错误。文中展示了其良好的功效特性,尤其是在相互作用模型实际相关的错误设定情况下。有现成可用的软件。我们使用该方法分析了四个队列中的帕金森病,并在多个队列中确定了几个单核苷酸多态性(SNP)对之间可能存在的相互作用。