Kreiner-Møller Eskil, Medina-Gomez Carolina, Uitterlinden André G, Rivadeneira Fernando, Estrada Karol
1] Department of Internal Medicine, Erasmus University Medical Center, Genetic Laboratory of Internal Medicin, Rotterdam, The Netherlands [2] COPSAC; Copenhagen Prospective Studies on Asthma in Childhood; Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark [3] The Danish Pediatric Asthma Center; Copenhagen University Hospital, Ledreborg Alle 34, Gentofte, Denmark.
Department of Internal Medicine, Erasmus University Medical Center, Genetic Laboratory of Internal Medicin, Rotterdam, The Netherlands.
Eur J Hum Genet. 2015 Mar;23(3):395-400. doi: 10.1038/ejhg.2014.91. Epub 2014 Jun 18.
Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) <5%). In this study, we present a two-step imputation approach improving the quality of the 1000 Genomes imputation by genotyping only a subset of samples to create a local reference population on a dense array with many low-frequency markers. In this approach, the study sample, genotyped with a first generation array, is imputed first to the local reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r(2) using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (P<1e-15) and rare homozygote calls (P<1e-15) in this low frequency range. The two-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy in the low-frequency spectrum and is a cost-effective strategy in large epidemiological studies.
基因型填充一直是全基因组关联研究(GWAS)成功识别与常见疾病相关的常见变异的支柱。然而,大多数GWAS仅使用60个HapMap样本作为填充参考,这意味着低频和罕见变异没有得到全面审查。随着新一代阵列确保足够的覆盖范围以及新的参考面板(如千人基因组面板)的出现,有助于对低频单核苷酸多态性(次要等位基因频率(MAF)<5%)进行填充。在本研究中,我们提出了一种两步填充方法,通过仅对一部分样本进行基因分型,在具有许多低频标记的密集阵列上创建本地参考群体,从而提高千人基因组填充的质量。在这种方法中,先用第一代阵列进行基因分型的研究样本,首先被填充到在密集阵列上进行基因分型的本地参考样本中,然后再填充到千人基因组参考面板中。我们表明,与直接填充到千人基因组参考相比,使用这种方法,对于MAF在1%至5%之间的变异,用r(2)衡量的平均填充质量提高了28%。同样,在这个低频范围内,对于杂合子(P<1e-15)和罕见纯合子调用(P<1e-15),填充基因型与真实基因型之间的一致性率也显著更高。在我们的设置中,与传统直接填充相比,两步法在低频谱中提高了填充质量,是大型流行病学研究中的一种具有成本效益的策略。