Personal Genomics Institute, Genome Research Foundation, Cheongju, 28190, Republic of Korea.
Department of Biology, University of New Mexico, Albuquerque, NM, 87131, USA.
Sci Rep. 2018 Apr 4;8(1):5677. doi: 10.1038/s41598-018-23837-x.
High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations, and provides a critical resource that can be used to more accurately identify pathogenic genetic variants. We report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals in order to characterize the benign ethnicity-relevant genetic variation present in the Korean population. In total, KoVariome includes 12.7M single-nucleotide variants (SNVs), 1.7M short insertions and deletions (indels), 4K structural variations (SVs), and 3.6K copy number variations (CNVs). Among them, 2.4M (19%) SNVs and 0.4M (24%) indels were identified as novel. We also discovered selective enrichment of 3.8M SNVs and 0.5M indels in Korean individuals, which were used to filter out 1,271 coding-SNVs not originally removed from the 1,000 Genomes Project when prioritizing disease-causing variants. KoVariome health records were used to identify novel disease-causing variants in the Korean population, demonstrating the value of high-quality ethnic variation databases for the accurate interpretation of individual genomes and the precise characterization of genetic variations.
高覆盖度的全基因组测序数据可以为特定族群提供有用的遗传变异目录,并提供了一个关键资源,可以更准确地识别致病的遗传变异。我们报告了对韩国人群的全面分析,并介绍了韩国国家标准参考变异组(KoVariome)。作为韩国个人基因组计划(KPGP)的一部分,我们构建了 KoVariome 数据库,该数据库使用了 50 名健康韩国个体的 5.5 太字节全基因组序列数据,以描述韩国人群中存在的良性与族群相关的遗传变异。总的来说,KoVariome 包含了 1270 万个单核苷酸变异(SNVs),170 万个短插入和缺失(indels),4000 个结构变异(SVs)和 3600 个拷贝数变异(CNVs)。其中,240 万个(19%)SNVs 和 40 万个(24%)indels 被鉴定为新的变异。我们还发现了韩国人群中 380 万个 SNVs 和 50 万个 indels 的选择性富集,这些变异被用来过滤掉在优先考虑致病变异时从 1000 基因组计划中未去除的 127 个编码-SNVs。KoVariome 健康记录被用来鉴定韩国人群中的新型致病变异,这证明了高质量族群变异数据库对于个体基因组的准确解释和遗传变异的精确描述的价值。