Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Baden-Württemberg, Germany.
Genome Res. 2024 Nov 20;34(11):1919-1930. doi: 10.1101/gr.279334.124.
The combination of ultra-long (UL) Oxford Nanopore Technologies (ONT) sequencing reads with long, accurate Pacific Bioscience (PacBio) High Fidelity (HiFi) reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely studied genomes: human HG002, Heinz 1706 (tomato), and B73 (maize). For the diploid, heterozygous HG002 genome, we also used "Pore-C" chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the UL reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and provides a multirun single-instrument solution for the reconstruction of complete genomes.
超长(UL)Oxford Nanopore Technologies(ONT)测序reads 与长而准确的 Pacific Bioscience(PacBio)高保真(HiFi)reads 的结合,使得人类基因组的完成成为可能,并推动了类似的努力来完成许多其他物种的基因组。然而,这种完整的“端粒到端粒”基因组组装方法依赖于多种测序平台,限制了其可及性。ONT“Duplex”测序reads 可以读取 DNA 的两条链,以提高质量,有望实现高碱基准确率。为了评估这种新的数据类型,我们为三个广泛研究的基因组生成了 ONT Duplex 数据:人类 HG002、Heinz 1706(番茄)和 B73(玉米)。对于二倍体、杂合的 HG002 基因组,我们还使用“Pore-C”染色质接触图谱完全相位化了单倍型。我们发现 Duplex 数据的准确性与 HiFi 测序相似,但读长长数十千碱基,并且 Pore-C 数据与现有的二倍体组装算法兼容。这种读长和准确性的组合使得可以构建高质量的初始组装,然后使用 UL reads 进一步解决,最后使用 Pore-C 将其相位化为染色体规模的单倍型。生成的组装体具有超过 99.999%(Q50)的碱基准确率和近乎完美的连续性,大多数染色体被组装成单个连续体。我们得出结论,ONT 测序是从头组装基因组的 HiFi 测序的可行替代品,并为完整基因组的重建提供了多轮单仪器解决方案。