Sung Yun Ju, Wang Lihua, Rankinen Tuomo, Bouchard Claude, Rao D C
Division of Biostatistics, School of Medicine, Washington University in St. Louis, Mo. 63110, USA.
Hum Hered. 2012;73(1):18-25. doi: 10.1159/000334084. Epub 2011 Dec 30.
Genotype imputations based on 1000 Genomes (1KG) Project data have the advantage of imputing many more SNPs than imputations based on HapMap data. It also provides an opportunity to discover associations with relatively rare variants. Recent investigations are increasingly using 1KG data for genotype imputations, but only limited evaluations of the performance of this approach are available. In this paper, we empirically evaluated imputation performance using 1KG data by comparing imputation results to those using the HapMap Phase II data that have been widely used. We used three reference panels: the CEU panel consisting of 120 haplotypes from HapMap II and 1KG data (June 2010 release) and the EUR panel consisting of 566 haplotypes also from 1KG data (August 2010 release). We used Illumina 324,607 autosomal SNPs genotyped in 501 individuals of European ancestry. Our most important finding was that both 1KG reference panels provided much higher imputation yield than the HapMap II panel. There were more than twice as many successfully imputed SNPs as there were using the HapMap II panel (6.7 million vs. 2.5 million). Our second most important finding was that accuracy using both 1KG panels was high and almost identical to accuracy using the HapMap II panel. Furthermore, after removing SNPs with MACH Rsq <0.3, accuracy for both rare and low frequency SNPs was very high and almost identical to accuracy for common SNPs. We found that imputation using the 1KG-EUR panel had advantages in successfully imputing rare, low frequency and common variants. Our findings suggest that 1KG-based imputation can increase the opportunity to discover significant associations for SNPs across the allele frequency spectrum. Because the 1KG Project is still underway, we expect that later versions will provide even better imputation performance.
基于千人基因组计划(1KG)数据的基因型填补,相比基于HapMap数据的填补,具有能够填补更多单核苷酸多态性(SNP)的优势。它还为发现与相对罕见变异的关联提供了机会。近期的研究越来越多地使用1KG数据进行基因型填补,但对这种方法性能的评估还很有限。在本文中,我们通过将填补结果与广泛使用的HapMap II期数据的结果进行比较,实证评估了使用1KG数据的填补性能。我们使用了三个参考面板:由HapMap II的120个单倍型和1KG数据(2010年6月发布)组成的CEU面板,以及同样由1KG数据(2010年8月发布)组成的包含566个单倍型的EUR面板。我们使用了在501名欧洲血统个体中进行基因分型的Illumina 324,607个常染色体SNP。我们最重要的发现是,两个1KG参考面板的填补成功率都比HapMap II面板高得多。成功填补的SNP数量是使用HapMap II面板时的两倍多(670万对250万)。我们第二重要的发现是,两个1KG面板的准确性都很高,并且几乎与使用HapMap II面板的准确性相同。此外,在去除MACH Rsq <0.3的SNP后,罕见和低频SNP的准确性都非常高,并且几乎与常见SNP的准确性相同。我们发现使用1KG-EUR面板进行填补在成功填补罕见、低频和常见变异方面具有优势。我们的研究结果表明,基于1KG的填补可以增加发现全等位基因频率谱上SNP显著关联的机会。由于千人基因组计划仍在进行中,我们预计后续版本将提供更好的填补性能。