Genome Informatics Laboratory, National Institute of Genetics.
Division of Life Sciences Center for Computational Sciences, University of Tsukuba, Japan.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad337.
Although current long-read sequencing technologies have a long-read length that facilitates assembly for genome reconstruction, they have high sequence errors. While various assemblers with different perspectives have been developed, no systematic evaluation of assemblers with long reads for diploid genomes with varying heterozygosity has been performed. Here, we evaluated a series of processes, including the estimation of genome characteristics such as genome size and heterozygosity, de novo assembly, polishing, and removal of allelic contigs, using six genomes with various heterozygosity levels. We evaluated five long-read-only assemblers (Canu, Flye, miniasm, NextDenovo and Redbean) and five hybrid assemblers that combine short and long reads (HASLR, MaSuRCA, Platanus-allee, SPAdes and WENGAN) and proposed a concrete guideline for the construction of haplotype representation according to the degree of heterozygosity, followed by polishing and purging haplotigs, using stable and high-performance assemblers: Redbean, Flye and MaSuRCA.
尽管当前的长读测序技术具有便于基因组重建的长读长,但它们具有较高的序列错误率。虽然已经开发了具有不同视角的各种组装程序,但尚未对具有不同杂合度的二倍体基因组的长读组装程序进行系统评估。在这里,我们使用具有不同杂合度水平的六个基因组评估了一系列过程,包括基因组大小和杂合度等基因组特征的估计、从头组装、打磨和等位基因图谱的去除。我们评估了五个仅使用长读的组装程序(Canu、Flye、miniasm、NextDenovo 和 Redbean)和五个结合短读和长读的混合组装程序(HASLR、MaSuRCA、Platanus-allee、SPAdes 和 WENGAN),并根据杂合度提出了构建单倍型表示的具体指导方针,然后使用稳定且高性能的组装程序(Redbean、Flye 和 MaSuRCA)进行打磨和清除单倍型图谱。