Kim Young Jin, Lee Juyoung, Kim Bong-Jo, Park Taesung
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-742, South Korea.
Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, 363-951, South Korea.
BMC Genomics. 2015 Dec 29;16:1109. doi: 10.1186/s12864-015-2192-y.
Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants.
In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF < 1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data.
Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants.
作为一种可能的缺失遗传力替代来源,罕见变异受到了越来越多的关注。由于下一代测序技术对于大规模基因组研究而言成本仍然过高,一种广泛使用的替代方法是基因填充。然而,基因填充方法可能会受到所填充罕见变异低准确性的限制。为了提高罕见变异的填充准确性,人们提出了各种方法,包括增加参考面板的样本量、使用来自特定研究样本(即特定人群)的测序数据,以及通过对一部分研究样本进行基因分型或测序来使用局部参考面板。虽然这些方法主要利用参考面板,但使用包含罕见变异的外显子芯片也可以提高罕见变异的填充准确性。外显子芯片包含从约12,000个测序样本中发现的变异中挑选出的25万个罕见变异。如果外显子芯片数据可用于先前进行基因分型的样本,那么使用包括外显子芯片和单核苷酸多态性(SNP)芯片的合并数据基因型面板的联合方法,应该能够提高罕见变异的填充准确性。
在本研究中,我们描述了一种联合填充方法,该方法同时使用外显子芯片和SNP芯片数据作为基因型面板。使用由T2D - GENES联盟的外显子测序数据构建的848个样本的参考面板以及由外显子芯片和SNP芯片组成的5349个样本基因型面板,证明了联合方法的有效性和性能。结果表明,与仅使用SNP芯片进行填充相比,联合方法将填充质量提高了11%,将罕见变异(小等位基因频率<1%)的基因组覆盖率提高了117.7%。此外,我们使用五个参考面板和三个基因型面板研究了参考面板对填充质量的系统影响。表现最佳的方法是特定研究参考面板与合并数据基因型面板的组合。
我们的研究表明,包括SNP芯片和外显子芯片在内的合并数据集可提高罕见变异的填充质量和基因组覆盖率。