Zhou Wei, Fritsche Lars G, Das Sayantan, Zhang He, Nielsen Jonas B, Holmen Oddgeir L, Chen Jin, Lin Maoxuan, Elvestad Maiken B, Hveem Kristian, Abecasis Goncalo R, Kang Hyun Min, Willer Cristen J
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America.
K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway.
Genet Epidemiol. 2017 Dec;41(8):744-755. doi: 10.1002/gepi.22067. Epub 2017 Sep 1.
The accuracy of genotype imputation depends upon two factors: the sample size of the reference panel and the genetic similarity between the reference panel and the target samples. When multiple reference panels are not consented to combine together, it is unclear how to combine the imputation results to optimize the power of genetic association studies. We compared the accuracy of 9,265 Norwegian genomes imputed from three reference panels-1000 Genomes phase 3 (1000G), Haplotype Reference Consortium (HRC), and a reference panel containing 2,201 Norwegian participants from the population-based Nord Trøndelag Health Study (HUNT) from low-pass genome sequencing. We observed that the population-matched reference panel allowed for imputation of more population-specific variants with lower frequency (minor allele frequency (MAF) between 0.05% and 0.5%). The overall imputation accuracy from the population-specific panel was substantially higher than 1000G and was comparable with HRC, despite HRC being 15-fold larger. These results recapitulate the value of population-specific reference panels for genotype imputation. We also evaluated different strategies to utilize multiple sets of imputed genotypes to increase the power of association studies. We observed that testing association for all variants imputed from any panel results in higher power to detect association than the alternative strategy of including only one version of each genetic variant, selected for having the highest imputation quality metric. This was particularly true for lower frequency variants (MAF < 1%), even after adjusting for the additional multiple testing burden.
参考面板的样本量以及参考面板与目标样本之间的遗传相似性。当不同意将多个参考面板合并在一起时,尚不清楚如何合并填充结果以优化基因关联研究的效能。我们比较了从三个参考面板——千人基因组计划第三阶段(1000G)、单倍型参考联盟(HRC)以及一个包含来自基于人群的北特伦德拉格健康研究(HUNT)的2201名挪威参与者的参考面板(通过低通量基因组测序获得)对9265个挪威基因组进行填充的准确性。我们观察到,与人群匹配的参考面板能够填充更多低频(次要等位基因频率(MAF)在0.05%至0.5%之间)的人群特异性变异。尽管HRC的规模是人群特异性面板的15倍,但人群特异性面板的总体填充准确性显著高于1000G,且与HRC相当。这些结果再次证明了人群特异性参考面板在基因型填充中的价值。我们还评估了利用多组填充基因型来提高关联研究效能的不同策略。我们观察到,对从任何面板填充的所有变异进行关联测试,比仅纳入每个遗传变异的一个版本(选择具有最高填充质量指标的版本)这一替代策略,具有更高的检测关联的效能。对于低频变异(MAF < 1%)尤其如此,即使在调整了额外的多重检验负担之后。