Hwang Mi Yeong, Choi Nak-Hyeon, Won Hong Hee, Kim Bong-Jo, Kim Young Jin
Division of Genome Science, Department of Precision Medicine, National Institute of Health, Cheongju-si, South Korea.
Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Samsung Medical Center, Sungkyunkwan University, Seoul, South Korea.
Front Genet. 2022 Nov 24;13:1008646. doi: 10.3389/fgene.2022.1008646. eCollection 2022.
Genotype imputation is essential for enhancing the power of association-mapping and discovering rare and indels that are missed by most genotyping arrays. Imputation analysis can be more accurate with a population-specific reference panel or a multi-ethnic reference panel with numerous samples. The National Institute of Health, Republic of Korea, initiated the Korean Reference Genome (KRG) project to identify variants in whole-genome sequences of ∼20,000 Korean participants. In the pilot phase, we analyzed the data from 1,490 participants. The genetic characteristics and imputation performance of the KRG were compared with those of the 1,000 Genomes Project Phase 3, GenomeAsia 100K Project, ChinaMAP, NARD, and TOPMed reference panels. For comparison analysis, genotype panels were artificially generated using whole-genome sequencing data from combinations of four different ancestries (Korean, Japanese, Chinese, and European) and two population-specific optimized microarrays (Korea Biobank Array and UK Biobank Array). The KRG reference panel performed best for the Korean population ( = 0.78-0.84, percentage of well-imputed is 91.9% for allele frequency >5%), although the other reference panels comprised a larger number of samples with genetically different background. By comparing multiple reference panels and multi-ethnic genotype panels, optimal imputation was obtained using reference panels from genetically related populations and a population-optimized microarray. Indeed, the reference panels of KRG and TOPMed showed the best performance when applied to the genotype panels of KBA ( = 0.84) and UKB ( = 0.87), respectively. Using a meta-imputation approach to merge imputation results from different reference panels increased the imputation accuracy for rare variants (∼7%) and provided additional well-imputed variants (∼20%) with comparable imputation accuracy to that of the KRG. Our results demonstrate the importance of using a population-specific reference panel and meta-imputation to assess a substantial number of accurately imputed rare variants.
基因型填充对于增强关联映射的能力以及发现大多数基因分型阵列遗漏的罕见变异和插入缺失至关重要。使用特定人群的参考面板或包含大量样本的多民族参考面板进行填充分析可以更准确。大韩民国国立卫生研究院启动了韩国参考基因组(KRG)项目,以识别约20000名韩国参与者全基因组序列中的变异。在试点阶段,我们分析了1490名参与者的数据。将KRG的遗传特征和填充性能与千人基因组计划第三阶段、亚洲基因组100K计划、中国MAP、NARD和TOPMed参考面板进行了比较。为了进行比较分析,使用来自四种不同祖先(韩国、日本、中国和欧洲)组合的全基因组测序数据以及两种特定人群优化的微阵列(韩国生物样本库阵列和英国生物样本库阵列)人工生成基因型面板。KRG参考面板在韩国人群中表现最佳(r = 0.78 - 0.84,等位基因频率>5%时填充良好的百分比为91.9%),尽管其他参考面板包含更多具有遗传背景差异的样本。通过比较多个参考面板和多民族基因型面板,使用来自遗传相关人群的参考面板和人群优化的微阵列可获得最佳填充效果。事实上,当分别应用于KBA(r = 0.84)和UKB(r = 0.87)的基因型面板时,KRG和TOPMed的参考面板表现最佳。使用元填充方法合并来自不同参考面板的填充结果提高了罕见变异的填充准确性(约7%),并提供了额外的填充良好的变异(约20%),其填充准确性与KRG相当。我们的结果证明了使用特定人群的参考面板和元填充来评估大量准确填充的罕见变异的重要性。