Suppr超能文献

分治(DC)BLAST:在高性能计算(HPC)环境中快速轻松地执行BLAST。

Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments.

作者信息

Yim Won Cheol, Cushman John C

机构信息

Department of Biochemistry and Molecular Biology, University of Nevada-Reno, Reno, NV, United States of America.

出版信息

PeerJ. 2017 Jun 22;5:e3486. doi: 10.7717/peerj.3486. eCollection 2017.

Abstract

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. This freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.

摘要

生物信息学目前面临着规模非常大的数据集,这些数据集导致计算任务,尤其是序列相似性搜索,可能需要荒谬的长时间才能运行。例如,美国国家生物技术信息中心(NCBI)的基本局部比对搜索工具(BLAST和BLAST+)套件,是目前在核酸或氨基酸序列之间进行快速相似性搜索使用最广泛的工具,它对中央处理器(CPU)的要求很高。虽然BLAST套件程序执行搜索非常迅速,但它们仍有加速的潜力。近年来,由于高性能计算(HPC)系统的可用性不断提高,分布式计算环境变得更容易访问和使用。因此,需要简单的数据并行化解决方案来加速BLAST和其他序列分析工具。然而,现有的用于并行序列相似性搜索的软件通常需要用户具备丰富的计算经验和技能。为了加速BLAST和其他序列分析工具,开发了分治BLAST(DCBLAST),通过使用查询序列分布方法在集群、网格或HPC环境中执行NCBI BLAST搜索。从1个CPU核心扩展到256个CPU核心,显著提高了处理速度。因此,DCBLAST使用简单、可访问、健壮且并行的方法极大地加速了BLAST搜索的执行。DCBLAST能自动跨多个节点运行,克服了单节点BLAST程序的速度限制。DCBLAST可用于任何HPC系统,可利用数百个节点,且没有输出限制。这个免费工具简化了分布式计算管道,以促进在非常大的数据集之间快速发现序列相似性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e2f/5483034/c1f8853cd1bb/peerj-05-3486-g001.jpg

相似文献

1
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments.
PeerJ. 2017 Jun 22;5:e3486. doi: 10.7717/peerj.3486. eCollection 2017.
3
Profiling the BLAST bioinformatics application for load balancing on high-performance computing clusters.
BMC Bioinformatics. 2022 Dec 16;23(1):544. doi: 10.1186/s12859-022-05029-7.
4
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.
BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.
5
Scaling bioinformatics applications on HPC.
BMC Bioinformatics. 2017 Dec 28;18(Suppl 14):501. doi: 10.1186/s12859-017-1902-7.
6
G-BLASTN: accelerating nucleotide alignment by graphics processors.
Bioinformatics. 2014 May 15;30(10):1384-91. doi: 10.1093/bioinformatics/btu047. Epub 2014 Jan 24.
7
Performance modelling of parallel BLAST using Intel and PGI compilers on an infiniband-based HPC cluster.
Int J Bioinform Res Appl. 2013;9(5):534-46. doi: 10.1504/IJBRA.2013.056086.
9
Using the Basic Local Alignment Search Tool (BLAST).
CSH Protoc. 2007 Jul 1;2007:pdb.top17. doi: 10.1101/pdb.top17.
10
HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.
J Biomed Inform. 2015 Apr;54:58-64. doi: 10.1016/j.jbi.2015.01.008. Epub 2015 Jan 24.

引用本文的文献

1
Identification of non-model mammal species using the MinION DNA sequencer from Oxford Nanopore.
PeerJ. 2024 Sep 25;12:e17887. doi: 10.7717/peerj.17887. eCollection 2024.
2
HPC-T-Annotator: an HPC tool for de novo transcriptome assembly annotation.
BMC Bioinformatics. 2024 Aug 21;25(1):272. doi: 10.1186/s12859-024-05887-3.
3
Profiling the BLAST bioinformatics application for load balancing on high-performance computing clusters.
BMC Bioinformatics. 2022 Dec 16;23(1):544. doi: 10.1186/s12859-022-05029-7.
4
The final piece of the Triangle of U: Evolution of the tetraploid Brassica carinata genome.
Plant Cell. 2022 Oct 27;34(11):4143-4172. doi: 10.1093/plcell/koac249.
5
FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies.
Bioinform Biol Insights. 2021 Nov 27;15:11779322211059238. doi: 10.1177/11779322211059238. eCollection 2021.
6
Rapid mitochondrial genome sequencing based on Oxford Nanopore Sequencing and a proxy for vertebrate species identification.
Ecol Evol. 2020 Mar 11;10(7):3544-3560. doi: 10.1002/ece3.6151. eCollection 2020 Apr.

本文引用的文献

1
SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads.
Bioinformatics. 2014 Jun 15;30(12):1660-6. doi: 10.1093/bioinformatics/btu077. Epub 2014 Feb 13.
2
De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.
Nat Protoc. 2013 Aug;8(8):1494-512. doi: 10.1038/nprot.2013.084. Epub 2013 Jul 11.
3
ScalaBLAST 2.0: rapid and robust BLAST calculations on multiprocessor systems.
Bioinformatics. 2013 Mar 15;29(6):797-8. doi: 10.1093/bioinformatics/btt013. Epub 2013 Jan 29.
4
Update on activities at the Universal Protein Resource (UniProt) in 2013.
Nucleic Acids Res. 2013 Jan;41(Database issue):D43-7. doi: 10.1093/nar/gks1068. Epub 2012 Nov 17.
5
SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.
J Comput Biol. 2012 May;19(5):455-77. doi: 10.1089/cmb.2012.0021. Epub 2012 Apr 16.
6
The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools.
Nucleic Acids Res. 2012 Jan;40(Database issue):D1202-10. doi: 10.1093/nar/gkr1090. Epub 2011 Dec 2.
7
Accelerated Profile HMM Searches.
PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.
8
GPU-BLAST: using graphics processors to accelerate protein sequence alignment.
Bioinformatics. 2011 Jan 15;27(2):182-8. doi: 10.1093/bioinformatics/btq644. Epub 2010 Nov 18.
9
BLAST+: architecture and applications.
BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.
10
PLAST: parallel local alignment search tool for database comparison.
BMC Bioinformatics. 2009 Oct 12;10:329. doi: 10.1186/1471-2105-10-329.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验