Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
Oxford Nanopore Technologies, Oxford, UK.
Nat Biotechnol. 2023 Oct;41(10):1474-1482. doi: 10.1038/s41587-023-01662-6. Epub 2023 Feb 16.
The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.
端粒到端粒联盟最近组装了第一个真正完整的人类基因组序列。为了解决最复杂的重复序列问题,该项目依赖于超长牛津纳米孔测序reads 与高分辨率组装图谱的手动整合,该图谱由长、准确的 PacBio 高保真 reads 构建。我们在 Verkko 中改进并自动化了这种策略,Verkko 是一个用于组装完整二倍体基因组的迭代、基于图的管道。Verkko 从长、准确的reads 构建多聚体 de Bruijn 图,并通过整合超长 reads 和单倍型特异性标记物来逐步简化该图。结果是两个单倍型的相位化、二倍体组装,许多染色体从端粒自动组装到端粒。在 HG002 人类基因组上运行 Verkko 导致 20 个二倍体染色体在 99.9997%的准确率下无间隙组装。完整的二倍体基因组组装是构建全面泛基因组数据库和染色体尺度比较基因组学的关键步骤。