Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599-7420, USA.
Genet Epidemiol. 2010 Dec;34(8):803-15. doi: 10.1002/gepi.20527.
Analysis of untyped single nucleotide polymorphisms (SNPs) can facilitate the localization of disease-causing variants and permit meta-analysis of association studies with different genotyping platforms. We present two approaches for using the linkage disequilibrium structure of an external reference panel to infer the unknown value of an untyped SNP from the observed genotypes of typed SNPs. The maximum-likelihood approach integrates the prediction of untyped genotypes and estimation of association parameters into a single framework and yields consistent and efficient estimators of genetic effects and gene-environment interactions with proper variance estimators. The imputation approach is a two-stage strategy, which first imputes the untyped genotypes by either the most likely genotypes or the expected genotype counts and then uses the imputed values in a downstream association analysis. The latter approach has proper control of type I error in single-SNP tests with possible covariate adjustments even when the reference panel is misspecified; however, type I error may not be properly controlled in testing multiple-SNP effects or gene-environment interactions. In general, imputation yields biased estimators of genetic effects and gene-environment interactions, and the variances are underestimated. We conduct extensive simulation studies to compare the bias, type I error, power, and confidence interval coverage between the maximum likelihood and imputation approaches in the analysis of single-SNP effects, multiple-SNP effects, and gene-environment interactions under cross-sectional and case-control designs. In addition, we provide an illustration with genome-wide data from the Wellcome Trust Case-Control Consortium (WTCCC) [2007].
分析未分型的单核苷酸多态性 (SNPs) 可以帮助定位致病变异,并允许使用不同的基因分型平台进行关联研究的荟萃分析。我们提出了两种方法,利用外部参考面板的连锁不平衡结构,从已分型 SNPs 的观察基因型推断未分型 SNPs 的未知值。最大似然法将未分型基因型的预测和关联参数的估计整合到一个单一的框架中,并通过适当的方差估计量,为遗传效应和基因-环境相互作用提供一致和有效的估计量。 推断方法是一种两阶段策略,首先通过最可能的基因型或预期的基因型计数来推断未分型的基因型,然后在下游关联分析中使用推断值。 后一种方法在单 SNP 检验中具有适当的 I 型错误控制,即使参考面板指定不当,也可以进行可能的协变量调整;然而,在检验多 SNP 效应或基因-环境相互作用时,I 型错误可能无法得到适当控制。 一般来说,推断会产生遗传效应和基因-环境相互作用的有偏估计量,并且方差被低估。 我们进行了广泛的模拟研究,比较了最大似然法和推断法在分析单 SNP 效应、多 SNP 效应和基因-环境相互作用时的偏差、I 型错误、功效和置信区间覆盖范围,包括在横截面和病例对照设计下。 此外,我们还提供了一个基于 Wellcome Trust Case-Control Consortium (WTCCC) [2007] 全基因组数据的实例。