Biernacka Joanna M, Tang Rui, Li Jia, McDonnell Shannon K, Rabe Kari G, Sinnwell Jason P, Rider David N, de Andrade Mariza, Goode Ellen L, Fridley Brooke L
Department of Health Sciences Research, Mayo Clinic, 200 First Street Southwest, Rochester, MN 55905 USA.
BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S5. doi: 10.1186/1753-6561-3-s7-s5.
Several methods have been proposed to impute genotypes at untyped markers using observed genotypes and genetic data from a reference panel. We used the Genetic Analysis Workshop 16 rheumatoid arthritis case-control dataset to compare the performance of four of these imputation methods: IMPUTE, MACH, PLINK, and fastPHASE. We compared the methods' imputation error rates and performance of association tests using the imputed data, in the context of imputing completely untyped markers as well as imputing missing genotypes to combine two datasets genotyped at different sets of markers. As expected, all methods performed better for single-nucleotide polymorphisms (SNPs) in high linkage disequilibrium with genotyped SNPs. However, MACH and IMPUTE generated lower imputation error rates than fastPHASE and PLINK. Association tests based on allele "dosage" from MACH and tests based on the posterior probabilities from IMPUTE provided results closest to those based on complete data. However, in both situations, none of the imputation-based tests provide the same level of evidence of association as the complete data at SNPs strongly associated with disease.
已经提出了几种方法,利用观察到的基因型和来自参考面板的遗传数据来推断未分型标记处的基因型。我们使用遗传分析研讨会16的类风湿性关节炎病例对照数据集,比较了其中四种推断方法的性能:IMPUTE、MACH、PLINK和fastPHASE。我们在推断完全未分型的标记以及推断缺失基因型以合并在不同标记集上进行基因分型的两个数据集的背景下,比较了这些方法的推断错误率和使用推断数据进行关联测试的性能。正如预期的那样,对于与基因分型的单核苷酸多态性(SNP)处于高连锁不平衡状态的SNP,所有方法的表现都更好。然而,MACH和IMPUTE产生的推断错误率低于fastPHASE和PLINK。基于MACH的等位基因“剂量”的关联测试和基于IMPUTE的后验概率的测试提供的结果最接近基于完整数据的结果。然而,在这两种情况下,在与疾病强烈相关的SNP处,基于推断的测试都没有提供与完整数据相同水平的关联证据。