Division of Biostatistics, Washington University in St. Louis, School of Medicine, St. Louis, Missouri 63110-1093, USA.
Genet Epidemiol. 2012 Jul;36(5):508-16. doi: 10.1002/gepi.21647. Epub 2012 May 29.
Genotype imputation provides imputation of untyped single nucleotide polymorphisms (SNPs) that are present on a reference panel such as those from the HapMap Project. It is popular for increasing statistical power and comparing results across studies using different platforms. Imputation for African American populations is challenging because their linkage disequilibrium blocks are shorter and also because no ideal reference panel is available due to admixture. In this paper, we evaluated three imputation strategies for African Americans. The intersection strategy used a combined panel consisting of SNPs polymorphic in both CEU and YRI. The union strategy used a panel consisting of SNPs polymorphic in either CEU or YRI. The merge strategy merged results from two separate imputations, one using CEU and the other using YRI. Because recent investigators are increasingly using the data from the 1000 Genomes (1KG) Project for genotype imputation, we evaluated both 1KG-based imputations and HapMap-based imputations. We used 23,707 SNPs from chromosomes 21 and 22 on Affymetrix SNP Array 6.0 genotyped for 1,075 HyperGEN African Americans. We found that 1KG-based imputations provided a substantially larger number of variants than HapMap-based imputations, about three times as many common variants and eight times as many rare and low-frequency variants. This higher yield is expected because the 1KG panel includes more SNPs. Accuracy rates using 1KG data were slightly lower than those using HapMap data before filtering, but slightly higher after filtering. The union strategy provided the highest imputation yield with next highest accuracy. The intersection strategy provided the lowest imputation yield but the highest accuracy. The merge strategy provided the lowest imputation accuracy. We observed that SNPs polymorphic only in CEU had much lower accuracy, reducing the accuracy of the union strategy. Our findings suggest that 1KG-based imputations can facilitate discovery of significant associations for SNPs across the whole MAF spectrum. Because the 1KG Project is still under way, we expect that later versions will provide better imputation performance.
基因型推断提供了对参考面板(如 HapMap 项目)中存在的未分型单核苷酸多态性(SNP)的推断。它在增加统计效力和比较使用不同平台的研究结果方面很受欢迎。由于其连锁不平衡块较短,并且由于混合,没有理想的参考面板,因此对非裔美国人进行推断具有挑战性。在本文中,我们评估了三种非裔美国人的推断策略。交集策略使用了一个由在 CEU 和 YRI 中均多态性的 SNP 组成的组合面板。联合策略使用了一个由在 CEU 或 YRI 中多态性的 SNP 组成的面板。合并策略合并了使用 CEU 和 YRI 进行两次单独推断的结果。由于最近的研究人员越来越多地使用 1000 基因组(1KG)项目的数据进行基因型推断,因此我们评估了基于 1KG 和基于 HapMap 的推断。我们使用了 1075 个 HyperGEN 非裔美国人中 Affymetrix SNP Array 6.0 上染色体 21 和 22 上的 23707 个 SNP。我们发现,基于 1KG 的推断提供了大量的变体,比基于 HapMap 的推断多,常见变体多约三倍,罕见和低频变体多约八倍。这一更高的产量预计是因为 1KG 面板包含更多的 SNP。在过滤之前,使用 1KG 数据的准确率略低于使用 HapMap 数据的准确率,但过滤后略高。联合策略提供了最高的推断产量和次高的准确性。交集策略提供了最低的推断产量,但最高的准确性。合并策略提供了最低的推断准确性。我们观察到,仅在 CEU 中多态性的 SNP 准确性要低得多,这降低了联合策略的准确性。我们的发现表明,基于 1KG 的推断可以促进整个 MAF 谱中 SNP 的显著关联的发现。由于 1KG 项目仍在进行中,我们预计以后的版本将提供更好的推断性能。