Mozheiko Evgeniy, Yi Heng, Lu Anzhi, Kong Heitung, Hou Yong, Zhou Yan, Gao Hui
MGI Tech, Shenzhen 518083, China.
MGI Tech, Riga 49276, Latvia.
Genome. 2025 Jan 1;68:1-9. doi: 10.1139/gen-2024-0132.
Recently developed hybrid assemblies can achieve telomere-to-telomere (T2T) completeness of some chromosomes. However, such approaches involve sequencing a large volume of both Pacific Biosciences high-fidelity (HiFi) and Oxford Nanopore Technologies (ONT) sequencing reads. Along with this, third-generation sequencing techniques are rapidly advancing, increasing the available length and accuracy. To reduce the final cost of genome assembly, here we investigated the possibility of assembly from low-coverage samples and with only ONT corrected by next-generation sequencing (NGS) sequencing reads. We demonstrated that haploid ONT-based assembly approaches corrected by NGS can achieve performance metrics comparable to more expensive hybrid approaches based on HiFi sequencing. We investigated the assembly of different chromosomes and the low-coverage performance of state-of-the-art hybrid assembly tools, including Verkko and Hifiasm, as well as ONT-based assemblers such as Shasta and Flye. We found that even with one-contig T2T assembly Verkko and Hifiasm still have numerous misassemblies within centromere. Therefore, we recommend using a combination of regular R9 or simplex R10 ONT reads and accurate NGS reads for assembly without aiming for T2T completeness. Additionally, we rigorously evaluated the performance of MGI, Illumina, and stLFR NGS technologies across various aspects of hybrid genome assembly, including pre-assembly correction, haplotype phasing, and polishing.
最近开发的混合组装方法可以实现某些染色体的端粒到端粒(T2T)完整度。然而,此类方法涉及对大量太平洋生物科学公司的高保真(HiFi)测序读数和牛津纳米孔技术公司(ONT)的测序读数进行测序。与此同时,第三代测序技术正在迅速发展,测序读长和准确性都在提高。为了降低基因组组装的最终成本,我们在此研究了从低覆盖度样本进行组装以及仅使用经二代测序(NGS)读数校正的ONT测序数据进行组装的可能性。我们证明,经NGS校正的基于单倍体ONT的组装方法可以实现与基于HiFi测序的更昂贵的混合方法相当的性能指标。我们研究了不同染色体的组装以及包括Verkko和Hifiasm在内的最先进的混合组装工具以及诸如Shasta和Flye等基于ONT的组装工具的低覆盖度性能。我们发现,即使对于Verkko和Hifiasm实现了单序列片段T2T组装的情况,在着丝粒区域内仍存在大量错误组装。因此,我们建议在不以T2T完整度为目标的组装过程中,结合使用常规的R9或单倍型R10 ONT读数以及准确的NGS读数。此外,我们还从混合基因组组装的各个方面,包括组装前校正、单倍型定相和抛光等方面,对MGI、Illumina和stLFR NGS技术的性能进行了严格评估。