Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA.
BMC Bioinformatics. 2013 Jun 7;14:184. doi: 10.1186/1471-2105-14-184.
The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison.
We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others.
The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.
下一代测序仪器的发展导致在单个运行中生成数百万个短序列。将这些读取与参考基因组对齐的过程既耗时又需要开发快速准确的对齐工具。然而,当前提出的工具在映射的准确性和速度之间做出了不同的折衷。此外,在将新开发的工具的性能与最新技术进行比较时,忽略了许多重要方面。因此,需要一种涵盖所有方面的客观评估方法。在这项工作中,我们引入了一个基准套件,以广泛分析各种方面的测序工具,并提供客观的比较。
我们使用合成数据和真实 RNA-Seq 数据,将我们的基准测试应用于 9 种知名的映射工具,即 Bowtie、Bowtie2、BWA、SOAP2、MAQ、RMAP、GSNAP、Novoalign 和 mrsFAST (mrFAST)。MAQ 和 RMAP 基于为读取构建哈希表,而其余工具基于索引参考基因组。基准测试揭示了每个工具的优缺点。结果表明,没有一个工具在所有指标上都优于所有其他工具。然而,Bowtie 在大多数测试中保持了最佳的吞吐量,而 BWA 在较长的读取长度下表现更好。基准测试不仅限于提到的工具,还可以进一步应用于其他工具。
映射过程仍然是一个受许多因素影响的难题。在这项工作中,我们提供了一个基准套件,揭示并评估了影响映射过程的不同因素。尽管如此,在所有测试中,没有一个工具都优于所有其他工具。因此,最终用户应该明确说明他的需求,以便选择提供最佳结果的工具。