短序列比对工具的基准测试。

Benchmarking short sequence mapping tools.

机构信息

Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA.

出版信息

BMC Bioinformatics. 2013 Jun 7;14:184. doi: 10.1186/1471-2105-14-184.

DOI:10.1186/1471-2105-14-184

PMID:23758764

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3694458/

Abstract

BACKGROUND

The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison.

RESULTS

We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others.

CONCLUSION

The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.

摘要

背景

下一代测序仪器的发展导致在单个运行中生成数百万个短序列。将这些读取与参考基因组对齐的过程既耗时又需要开发快速准确的对齐工具。然而，当前提出的工具在映射的准确性和速度之间做出了不同的折衷。此外，在将新开发的工具的性能与最新技术进行比较时，忽略了许多重要方面。因此，需要一种涵盖所有方面的客观评估方法。在这项工作中，我们引入了一个基准套件，以广泛分析各种方面的测序工具，并提供客观的比较。

结果

我们使用合成数据和真实 RNA-Seq 数据，将我们的基准测试应用于 9 种知名的映射工具，即 Bowtie、Bowtie2、BWA、SOAP2、MAQ、RMAP、GSNAP、Novoalign 和 mrsFAST (mrFAST)。MAQ 和 RMAP 基于为读取构建哈希表，而其余工具基于索引参考基因组。基准测试揭示了每个工具的优缺点。结果表明，没有一个工具在所有指标上都优于所有其他工具。然而，Bowtie 在大多数测试中保持了最佳的吞吐量，而 BWA 在较长的读取长度下表现更好。基准测试不仅限于提到的工具，还可以进一步应用于其他工具。

结论

映射过程仍然是一个受许多因素影响的难题。在这项工作中，我们提供了一个基准套件，揭示并评估了影响映射过程的不同因素。尽管如此，在所有测试中，没有一个工具都优于所有其他工具。因此，最终用户应该明确说明他的需求，以便选择提供最佳结果的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d02/3694458/9596bc729c8a/1471-2105-14-184-1.jpg

相似文献

Benchmarking short sequence mapping tools.短序列比对工具的基准测试。

BMC Bioinformatics. 2013 Jun 7;14:184. doi: 10.1186/1471-2105-14-184.

Systematic benchmark of ancient DNA read mapping.系统评估古 DNA 读段映射。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab076.

Short Sequence Aligner Benchmarking for Chromatin Research.短序列比对工具在染色质研究中的基准测试。

Int J Mol Sci. 2023 Sep 14;24(18):14074. doi: 10.3390/ijms241814074.

Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.基于全基因组特征，对多种新一代测序比对器的读段比对进行评估。

Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.

Fast and memory efficient approach for mapping NGS reads to a reference genome.将二代测序（NGS） reads 映射到参考基因组的快速且内存高效的方法。

J Bioinform Comput Biol. 2019 Apr;17(2):1950008. doi: 10.1142/S0219720019500082.

ARYANA: Aligning Reads by Yet Another Approach.ARYANA：另一种方法进行读段对齐。

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S12. doi: 10.1186/1471-2105-15-S9-S12. Epub 2014 Sep 10.

A fast read alignment method based on seed-and-vote for next generation sequencing.一种基于种子与投票的用于下一代测序的快速读段比对方法。

BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):466. doi: 10.1186/s12859-016-1329-6.

SOAP3: ultra-fast GPU-based parallel alignment tool for short reads.SOAP3：基于 GPU 的超快速短读序列并行比对工具。

Bioinformatics. 2012 Mar 15;28(6):878-9. doi: 10.1093/bioinformatics/bts061. Epub 2012 Jan 28.

Multi-threading the generation of Burrows-Wheeler Alignment.多线程生成布罗-惠勒比对。

Genet Mol Res. 2016 May 23;15(2):gmr8650. doi: 10.4238/gmr.15028650.

Ψ-RA: a parallel sparse index for genomic read alignment.Ψ-RA：一种用于基因组读取比对的并行稀疏索引。

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.

引用本文的文献

Optimization of Mapping Tools and Investigation of Ribosomal RNA Influence for Data-Driven Gene Expression Analysis in Complex Microbiomes.复杂微生物群落中数据驱动基因表达分析的图谱工具优化及核糖体RNA影响研究

Microorganisms. 2025 Apr 26;13(5):995. doi: 10.3390/microorganisms13050995.

Development of a web-based high-throughput marker design program: CAPS (cleaved amplified polymorphic sequence) Maker.基于网络的高通量标记设计程序：CAPS（酶切扩增多态性序列）标记器的开发。

Plant Methods. 2024 Dec 24;20(1):192. doi: 10.1186/s13007-024-01319-8.

Benchmarking of five NGS mapping tools for the reference alignment of bacterial outer membrane vesicles-associated small RNAs.用于细菌外膜囊泡相关小RNA参考比对的五种二代测序（NGS）比对工具的基准测试

Front Microbiol. 2024 Jul 19;15:1401985. doi: 10.3389/fmicb.2024.1401985. eCollection 2024.

SigAlign: an alignment algorithm guided by explicit similarity criteria.SigAlign：一种基于显式相似性标准的对齐算法。

Nucleic Acids Res. 2024 Aug 27;52(15):8717-8733. doi: 10.1093/nar/gkae607.

Unravelling reference bias in ancient DNA datasets.揭示古代DNA数据集中的参考偏差

Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae436.

Human-specific epigenomic states in spermatogenesis.精子发生过程中的人类特异性表观基因组状态。

Comput Struct Biotechnol J. 2023 Dec 27;23:577-588. doi: 10.1016/j.csbj.2023.12.037. eCollection 2024 Dec.

Target capture and genome skimming for plant diversity studies.用于植物多样性研究的目标捕获和基因组浅层测序

Appl Plant Sci. 2023 Aug 10;11(4):e11537. doi: 10.1002/aps3.11537. eCollection 2023 Jul-Aug.

Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools.使用 BWA-MEM2 和 Dragen-GATK 工具评估优化后的种系外显子组管道。

PLoS One. 2023 Aug 3;18(8):e0288371. doi: 10.1371/journal.pone.0288371. eCollection 2023.

Somatic CNV Detection by Single-Cell Whole-Genome Sequencing in Postmortem Human Brain.利用单细胞全基因组测序在人死后脑组织中检测体细胞 CNV。

Methods Mol Biol. 2023;2561:205-230. doi: 10.1007/978-1-0716-2655-9_11.

Rapid genome sequencing for pediatrics.儿科快速基因组测序。

Hum Mutat. 2022 Nov;43(11):1507-1518. doi: 10.1002/humu.24466. Epub 2022 Sep 23.

本文引用的文献

Tools for mapping high-throughput sequencing data.高通量测序数据映射工具。

Bioinformatics. 2012 Dec 15;28(24):3169-77. doi: 10.1093/bioinformatics/bts605. Epub 2012 Oct 11.

RazerS 3: faster, fully sensitive read mapping.RazerS 3：更快、全敏读映射。

Bioinformatics. 2012 Oct 15;28(20):2592-9. doi: 10.1093/bioinformatics/bts505. Epub 2012 Aug 24.

Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis.在基因组序列上定位 reads：算法概述与实际比较分析

J Comput Biol. 2012 Jun;19(6):796-813. doi: 10.1089/cmb.2012.0022. Epub 2012 Apr 16.

Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

GemSIM: general, error-model based simulator of next-generation sequencing data.GemSIM：新一代测序数据的通用、基于错误模型的模拟器。

BMC Genomics. 2012 Feb 15;13:74. doi: 10.1186/1471-2164-13-74.

SOAP3: ultra-fast GPU-based parallel alignment tool for short reads.SOAP3：基于 GPU 的超快速短读序列并行比对工具。

Bioinformatics. 2012 Mar 15;28(6):878-9. doi: 10.1093/bioinformatics/bts061. Epub 2012 Jan 28.

ART: a next-generation sequencing read simulator.ART：一种新一代测序读模拟程序。

Bioinformatics. 2012 Feb 15;28(4):593-4. doi: 10.1093/bioinformatics/btr708. Epub 2011 Dec 23.

Comparative analysis of algorithms for next-generation sequencing read alignment.下一代测序读段比对算法的比较分析。

Bioinformatics. 2011 Oct 15;27(20):2790-6. doi: 10.1093/bioinformatics/btr477. Epub 2011 Aug 19.

A novel and well-defined benchmarking method for second generation read mapping.第二代读段映射的新颖而明确的基准测试方法。

BMC Bioinformatics. 2011 May 26;12:210. doi: 10.1186/1471-2105-12-210.

Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming.使用图形处理单元编程实现微生物基因组的精确和完整短读序列比对。

Bioinformatics. 2011 May 15;27(10):1351-8. doi: 10.1093/bioinformatics/btr151. Epub 2011 Mar 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

短序列比对工具的基准测试。

Benchmarking short sequence mapping tools.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献