Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany.
BMC Bioinformatics. 2012 May 6;13:82. doi: 10.1186/1471-2105-13-82.
Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads.
Here we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only.
Our suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.
新一代测序技术的通量不断提高,这给新一代从头序列组装器带来了挑战。最近的序列组装器都是基于构建 de Bruijn 图。另一个越来越受到关注的框架是组装字符串图,它不需要将读取序列划分成 k-mers,而是需要快速算法来计算所有读取序列对之间的后缀-前缀匹配。
本文提出了一种从一组测序读取中构建字符串图的有效方法。我们的方法采用后缀排序和扫描方法来计算后缀-前缀匹配。在处理过程中,会及早识别和消除传递边,并仅构建有效的不可约边。
我们的后缀-前缀匹配确定和字符串图构建算法已经在 Readjoiner 软件包中实现。与现有的基于字符串图的组装器的比较表明,Readjoiner 更快、更节省空间。Readjoiner 可在 http://www.zbh.uni-hamburg.de/readjoiner 获得。