Song Minsun, Wheeler William, Caporaso Neil E, Landi Maria Teresa, Chatterjee Nilanjan
Department of Statistics, Sookmyung Women's University, Seoul, Korea.
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Rockville, Maryland, United States of America.
Genet Epidemiol. 2018 Mar;42(2):146-155. doi: 10.1002/gepi.22093. Epub 2017 Nov 26.
Genome-wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene-environment interactions. We focus on case-control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene-environment independence in the underlying population. As increasingly large-scale GWAS are being performed through consortia effort where it is preferable to share only summary-level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta-analysis of "one-step" maximum-likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type-I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene-environment interactions under the assumption of gene-environment independence. Methods are made available for public use through CGEN R software package.
全基因组关联研究(GWAS)现在通常基于在参考数据集上训练的各种强大的统计归因算法,对未分型的单核苷酸多态性(SNP)进行归因。已知将归因SNP的预测等位基因计数用作剂量变量可产生有效的基因关联得分检验。在本文中,我们研究了如何在纳入基因 - 环境相互作用的各种现代复杂基因关联测试中最好地处理归因SNP。我们专注于病例对照关联研究,其中可以使用替代方法对潜在的逻辑回归模型进行推断,这些方法在不同程度上依赖于潜在人群中基因 - 环境独立性的假设。随着越来越大规模的GWAS通过合作联盟进行,在研究之间最好只共享汇总水平的信息,我们还描述了基于跨研究的“一步”最大似然估计的标准荟萃分析来实施得分检验的简单机制。这些方法在模拟研究和肺癌GWAS的一个数据集中的应用说明了所提出的方法能够为潜在的测试程序维持I型错误率。对于归因SNP的分析,与分型SNP类似,在基因 - 环境独立性假设下,回顾性方法可以在基因 - 环境相互作用建模方面带来相当大的效率提升。这些方法通过CGEN R软件包供公众使用。