Dung Le Thi, Lam Le Tung, Trang Nguyen Hong, Anh Nguyen Vu Hung, Nam Nguyen Ngoc, Nhung Doan Thi, Linh Tran Huyen, Giang Le Ngoc, Ha Hoang, Huy Nguyen Quang, Hai Truong Nam
Institute of Biology, Vietnam Academy of Science and Technology (VAST), Hanoi 10072, Vietnam.
Department of Life Sciences, University of Science and Technology of Hanoi (USTH), Vietnam Academy of Science and Technology (VAST), Hanoi 10072, Vietnam.
Genes (Basel). 2025 Apr 29;16(5):536. doi: 10.3390/genes16050536.
Population-specific reference genomes are essential for improving the accuracy and reliability of genomic analyses across diverse human populations. Although Vietnam ranks as the 16th most populous country in the world, with more than 86% of its population identifying as Kinh, studies specifically focusing on the Kinh Vietnamese reference genome remain scarce. Therefore, constructing a Kinh Vietnamese reference genome is valuable in the genetic research of Vietnamese. In this study, we combined PacBio long-read sequencing and Bionano optical mapping data to generate a de novo assembly of a Kinh Vietnamese genome (VHG), which was subsequently polished using multiple Kinh Vietnamese short-read whole-genome sequences (WGSs). The final assembly, named VHG1.2, comprised 3.22 gigabase pairs of high-quality sequence data, demonstrating high accuracy (QV: 48), completeness (BUSCO: 92%), and continuity (295 super scaffolds, super scaffold N50: 50 Kbp). Using multiple bioinformatic tools for variant calling, we observed significant variants when the population-specific reference VHG1.2 was used compared to the standard reference genome hg38. Overall, our genome assembly demonstrates the advantages of a long-read hybrid sequencing approach for de novo assembly and highlights the benefit of using population-specific reference genomes in population genomic analysis.
特定人群的参考基因组对于提高不同人类群体基因组分析的准确性和可靠性至关重要。尽管越南是世界上人口第16多的国家,超过86%的人口为京族,但专门针对京族越南人参考基因组的研究仍然很少。因此,构建京族越南人参考基因组对越南的基因研究具有重要价值。在本研究中,我们结合了PacBio长读长测序和Bionano光学图谱数据,对京族越南人基因组(VHG)进行了从头组装,随后使用多个京族越南人短读长全基因组序列(WGS)进行了优化。最终组装的名为VHG1.2的基因组包含32.2亿碱基对的高质量序列数据,显示出高准确性(QV:48)、完整性(BUSCO:92%)和连续性(295个超级支架,超级支架N50:50 Kbp)。使用多种生物信息学工具进行变异检测时,与标准参考基因组hg38相比,我们发现使用特定人群参考基因组VHG1.2时存在显著变异。总体而言,我们的基因组组装展示了长读长混合测序方法在从头组装中的优势,并突出了在群体基因组分析中使用特定人群参考基因组的好处。