Huang Lucy, Li Yun, Singleton Andrew B, Hardy John A, Abecasis Gonçalo, Rosenberg Noah A, Scheet Paul
Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Am J Hum Genet. 2009 Feb;84(2):235-50. doi: 10.1016/j.ajhg.2009.01.013.
A current approach to mapping complex-disease-susceptibility loci in genome-wide association (GWA) studies involves leveraging the information in a reference database of dense genotype data. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and tested for disease association. This imputation strategy has been successful for GWA studies in populations well represented by existing reference panels. We used genotypes at 513,008 autosomal single-nucleotide polymorphism (SNP) loci in 443 unrelated individuals from 29 worldwide populations to evaluate the "portability" of the HapMap reference panels for imputation in studies of diverse populations. When a single HapMap panel was leveraged for imputation of randomly masked genotypes, European populations had the highest imputation accuracy, followed by populations from East Asia, Central and South Asia, the Americas, Oceania, the Middle East, and Africa. For each population, we identified "optimal" mixtures of reference panels that maximized imputation accuracy, and we found that in most populations, mixtures including individuals from at least two HapMap panels produced the highest imputation accuracy. From a separate survey of additional SNPs typed in the same samples, we evaluated imputation accuracy in the scenario in which all genotypes at a given SNP position were unobserved and were imputed on the basis of data from a commercial "SNP chip," again finding that most populations benefited from the use of combinations of two or more HapMap reference panels. Our results can serve as a guide for selecting appropriate reference panels for imputation-based GWA analysis in diverse populations.
在全基因组关联(GWA)研究中,一种用于绘制复杂疾病易感基因座的当前方法涉及利用密集基因型数据参考数据库中的信息。通过对参考面板中的连锁不平衡模式进行建模,可以估算出研究样本中未直接测量的基因型,并对其进行疾病关联测试。这种估算策略在现有参考面板能够很好代表的人群的GWA研究中取得了成功。我们使用了来自29个全球人群的443名无亲缘关系个体中513,008个常染色体单核苷酸多态性(SNP)位点的基因型,来评估HapMap参考面板在不同人群研究中用于估算的“可移植性”。当使用单个HapMap面板对随机屏蔽的基因型进行估算时,欧洲人群的估算准确性最高,其次是东亚、中亚和南亚、美洲、大洋洲、中东和非洲的人群。对于每个人群,我们确定了能使估算准确性最大化的参考面板“最佳”组合,并且我们发现,在大多数人群中,包含至少两个HapMap面板个体的组合产生了最高的估算准确性。通过对在相同样本中分型的其他SNP进行单独调查,我们评估了在给定SNP位置所有基因型均未观察到并基于商业“ SNP芯片”数据进行估算的情况下的估算准确性,再次发现大多数人群受益于使用两个或更多HapMap参考面板的组合。我们的结果可以作为在不同人群中为基于估算的GWA分析选择合适参考面板的指南。