Malhotra Alka, Kobes Sayuko, Bogardus Clifton, Knowler William C, Baier Leslie J, Hanson Robert L
Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, Arizona, United States of America.
PLoS One. 2014 Jul 11;9(7):e102544. doi: 10.1371/journal.pone.0102544. eCollection 2014.
Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD) with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the present study, we assessed accuracy of imputation using HapMap reference populations in a genome-wide association study in Pima Indians.
Data from six randomly selected chromosomes were used. Genotypes in the study population were masked (either 1% or 20% of SNPs available for a given chromosome). The masked genotypes were then imputed using the software Markov Chain Haplotyping Algorithm. Using four HapMap reference populations, average genotype error rates ranged from 7.86% for Mexican Americans to 22.30% for Yoruba. In contrast, use of the original Pima Indian data as a reference resulted in an average error rate of 1.73%.
Our results suggest that the use of HapMap reference populations results in substantial inaccuracy in the imputation of genotypes in American Indians. A possible solution would be to densely genotype or sequence a reference American Indian population.
基因型填充常用于基因关联研究,以利用与已分型标记的连锁不平衡(LD)信息来检测未分型变异。填充基因型需要一个已知LD模式的合适参考群体,最常见的是从国际人类基因组单体型图计划(HapMap)中选择的群体。然而,一些群体,如美洲印第安人,在HapMap中没有代表性。在本研究中,我们在皮马印第安人的全基因组关联研究中评估了使用HapMap参考群体进行填充的准确性。
使用了从六个随机选择的染色体上获取的数据。研究群体中的基因型被屏蔽(给定染色体上可用单核苷酸多态性(SNP)的1%或20%)。然后使用马尔可夫链单倍型算法软件对屏蔽的基因型进行填充。使用四个HapMap参考群体,平均基因型错误率从墨西哥裔美国人的7.86%到约鲁巴人的22.30%不等。相比之下,使用原始皮马印第安人数据作为参考导致平均错误率为1.73%。
我们的结果表明,使用HapMap参考群体在美洲印第安人的基因型填充中会导致大量不准确。一个可能的解决方案是对一个参考美洲印第安人群体进行密集基因分型或测序。