Schmidt Bertil, Sinha Ranjan, Beresford-Smith Bryan, Puglisi Simon J
School of Computer Engineering, Nanyang Technological University, Singapore.
Bioinformatics. 2009 Sep 1;25(17):2279-80. doi: 10.1093/bioinformatics/btp374. Epub 2009 Jun 17.
The shorter and vastly more numerous reads produced by second-generation sequencing technologies require new tools that can assemble massive numbers of reads in reasonable time. Existing short-read assembly tools can be classified into two categories: greedy extension-based and graph-based. While the graph-based approaches are generally superior in terms of assembly quality, the computer resources required for building and storing a huge graph are very high. In this article, we present Taipan, an assembly algorithm which can be viewed as a hybrid of these two approaches. Taipan uses greedy extensions for contig construction but at each step realizes enough of the corresponding read graph to make better decisions as to how assembly should continue. We show that this approach can achieve an assembly quality at least as good as the graph-based approaches used in the popular Edena and Velvet assembly tools using a moderate amount of computing resources.
第二代测序技术产生的读段更短且数量众多,这就需要新的工具能够在合理时间内组装大量读段。现有的短读段组装工具可分为两类:基于贪婪延伸的和基于图的。虽然基于图的方法在组装质量方面通常更优,但构建和存储巨大的图所需的计算机资源非常高。在本文中,我们提出了Taipan,一种可视为这两种方法混合的组装算法。Taipan使用贪婪延伸进行重叠群构建,但在每一步都实现足够的相应读段图,以便就组装应如何继续做出更好的决策。我们表明,这种方法能够使用适量的计算资源实现至少与流行的Edena和Velvet组装工具中基于图的方法一样好的组装质量。