Huang Guan-Hua, Tseng Yi-Chi
Institute of Statistics, National Chiao Tung University, 1001 University Road, Hsinchu 30010, Taiwan.
BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S64. doi: 10.1186/1753-6561-8-S1-S64. eCollection 2014.
Genome-wide association studies have successfully identified common variants that are associated with complex diseases. However, the majority of genetic variants contributing to disease susceptibility are yet to be discovered. It is now widely believed that multiple rare variants are likely to be associated with complex diseases. Using custom-made chips or next-generation sequencing to uncover the effects of rare variants on the disease can be very expensive in current technology. Consequently, many researchers use the genotype imputation approach to predict the genotypes at these rare variants that are not directly genotyped in the study sample. One important question in genotype imputation is how to choose a reference panel that will produce high imputation accuracy in a population of interest. Using whole genome sequence data from the Genetic Analysis Workshop 18 data set, this report compares genotype imputation accuracy among reference panels representing different degrees of genetic similarity to a study sample of admixed Mexican Americans. Results show that a reference panel that closely matches the ancestry of the study population can increase imputation accuracy, but it can also result in more missing genotype calls. Having a larger-size reference panel can reduce imputation error and missing genotype, but the improvement may be limited. We also find that, for the admixed study sample, the simple selection of a single best-reference panel among HapMap African, European, or Asian population is not appropriate. The composite reference panel combining all available reference data should be used.
全基因组关联研究已成功识别出与复杂疾病相关的常见变异。然而,导致疾病易感性的大多数遗传变异仍有待发现。现在人们普遍认为,多个罕见变异可能与复杂疾病有关。在当前技术条件下,使用定制芯片或下一代测序来揭示罕见变异对疾病的影响可能非常昂贵。因此,许多研究人员使用基因型填充方法来预测研究样本中未直接进行基因分型的这些罕见变异的基因型。基因型填充中的一个重要问题是如何选择一个参考面板,使其在感兴趣的人群中产生较高的填充准确性。本报告使用遗传分析研讨会18数据集的全基因组序列数据,比较了代表与混合墨西哥裔美国人研究样本不同程度遗传相似性的参考面板之间的基因型填充准确性。结果表明,与研究人群祖先密切匹配的参考面板可以提高填充准确性,但也可能导致更多的基因型缺失调用。拥有更大规模的参考面板可以减少填充误差和基因型缺失,但这种改进可能有限。我们还发现,对于混合研究样本,在HapMap非洲、欧洲或亚洲人群中简单选择单个最佳参考面板是不合适的。应使用结合所有可用参考数据的复合参考面板。