IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):334-338. doi: 10.1109/TCBB.2018.2875479. Epub 2018 Oct 11.
The de-novo genome assembly is a challenging computational problem for which several pipelines have been developed. The advent of long-read sequencing technology has resulted in a new set of algorithmic approaches for the assembly process. In this work, we identify that one of these new and fast long-read assembly techniques (using Minimap2 and Miniasm) can be modified for the short-read assembly process. This possibility motivated us to customize a long-read assembly approach for applications in a short-read assembly scenario. Here, we compare and contrast our proposed de-novo assembly pipeline (MiniSR) with three other recently developed programs for the assembly of bacterial and small eukaryotic genomes. We have documented two trade-offs: one between speed and accuracy and the other between contiguity and base-calling errors. Our proposed assembly pipeline shows a good balance in these trade-offs. The resulting pipeline is 6 and 2.2 times faster than the short-read assemblers Spades and SGA, respectively. MiniSR generates assemblies of superior N50 and NGA50 to SGA, although assemblies are less complete and accurate than those from Spades. A third tool, SOAPdenovo2, is as fast as our proposed pipeline but had poorer assembly quality.
从头基因组组装是一个具有挑战性的计算问题,为此已经开发了几种流水线。长读测序技术的出现为组装过程带来了一套新的算法方法。在这项工作中,我们发现这些新的快速长读序列组装技术(使用 Minimap2 和 Miniasm)之一可以修改为短读序列组装过程。这种可能性促使我们为短读序列组装场景中的应用定制了一种长读序列组装方法。在这里,我们将我们提出的从头组装流水线(MiniSR)与其他三种最近开发的用于细菌和小型真核生物基因组组装的程序进行了比较和对比。我们记录了两个权衡:一个是速度和准确性之间的权衡,另一个是连续性和碱基调用错误之间的权衡。我们提出的组装流水线在这些权衡中表现出了很好的平衡。与短读序列组装器 Spades 和 SGA 相比,生成的流水线分别快 6 倍和 2.2 倍。MiniSR 生成的组装体具有更好的 N50 和 NGA50 值,优于 SGA,尽管组装体的完整性和准确性不如 Spades 生成的组装体。第三个工具 SOAPdenovo2 与我们提出的流水线一样快,但组装质量较差。