Deelen Patrick, Menelaou Androniki, van Leeuwen Elisabeth M, Kanterakis Alexandros, van Dijk Freerk, Medina-Gomez Carolina, Francioli Laurent C, Hottenga Jouke Jan, Karssen Lennart C, Estrada Karol, Kreiner-Møller Eskil, Rivadeneira Fernando, van Setten Jessica, Gutierrez-Achury Javier, Westra Harm-Jan, Franke Lude, van Enckevort David, Dijkstra Martijn, Byelas Heorhiy, van Duijn Cornelia M, de Bakker Paul I W, Wijmenga Cisca, Swertz Morris A
1] University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands [2] University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, The Netherlands.
Department of Medical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands.
Eur J Hum Genet. 2014 Nov;22(11):1321-6. doi: 10.1038/ejhg.2014.19. Epub 2014 Jun 4.
Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with 'true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05-0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r(2), increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r(2) improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r(2) increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results.
尽管全基因组关联研究(GWAS)已识别出许多与复杂性状相关的常见变异,但低频和罕见变异尚未得到全面研究。从密集参考面板(如千人基因组计划(1000G))进行的基因型填充,能够对未分型变异进行关联测试。在此,我们展示了使用一个新的大型特定人群面板——荷兰基因组(GoNL)进行基因型填充的结果。我们通过在三个欧洲人群(荷兰、英国和意大利)中比较填充基因型与在免疫芯片上分型的“真实”基因型,对1000G和GoNL参考集的性能进行了基准测试。与1000G相比,GoNL在罕见变异(最小等位基因频率为0.05 - 0.5%)的填充质量上有显著提高。在荷兰样本中,观察到的平均皮尔逊相关系数r²从0.61提高到了0.71。我们还看到其他欧洲人群的填充准确性也有所提高(在英国样本中,r²从0.58提高到0.65,在意大利样本中从0.43提高到0.47)。包含1000G和GoNL的组合参考集进一步提高了罕见变异的填充效果。意大利样本从这个组合参考中受益最大(平均r²从0.47提高到0.50)。我们得出结论,创建一个大型特定人群参考对于填充罕见变异是有利的,并且跨多个人群的组合参考面板能产生最佳的填充结果。