• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分治(DC)BLAST:在高性能计算(HPC)环境中快速轻松地执行BLAST。

Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments.

作者信息

Yim Won Cheol, Cushman John C

机构信息

Department of Biochemistry and Molecular Biology, University of Nevada-Reno, Reno, NV, United States of America.

出版信息

PeerJ. 2017 Jun 22;5:e3486. doi: 10.7717/peerj.3486. eCollection 2017.

DOI:10.7717/peerj.3486
PMID:28652936
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5483034/
Abstract

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. This freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.

摘要

生物信息学目前面临着规模非常大的数据集,这些数据集导致计算任务,尤其是序列相似性搜索,可能需要荒谬的长时间才能运行。例如,美国国家生物技术信息中心(NCBI)的基本局部比对搜索工具(BLAST和BLAST+)套件,是目前在核酸或氨基酸序列之间进行快速相似性搜索使用最广泛的工具,它对中央处理器(CPU)的要求很高。虽然BLAST套件程序执行搜索非常迅速,但它们仍有加速的潜力。近年来,由于高性能计算(HPC)系统的可用性不断提高,分布式计算环境变得更容易访问和使用。因此,需要简单的数据并行化解决方案来加速BLAST和其他序列分析工具。然而,现有的用于并行序列相似性搜索的软件通常需要用户具备丰富的计算经验和技能。为了加速BLAST和其他序列分析工具,开发了分治BLAST(DCBLAST),通过使用查询序列分布方法在集群、网格或HPC环境中执行NCBI BLAST搜索。从1个CPU核心扩展到256个CPU核心,显著提高了处理速度。因此,DCBLAST使用简单、可访问、健壮且并行的方法极大地加速了BLAST搜索的执行。DCBLAST能自动跨多个节点运行,克服了单节点BLAST程序的速度限制。DCBLAST可用于任何HPC系统,可利用数百个节点,且没有输出限制。这个免费工具简化了分布式计算管道,以促进在非常大的数据集之间快速发现序列相似性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e2f/5483034/ef813efe2014/peerj-05-3486-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e2f/5483034/c1f8853cd1bb/peerj-05-3486-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e2f/5483034/a21e25e1100f/peerj-05-3486-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e2f/5483034/ef813efe2014/peerj-05-3486-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e2f/5483034/c1f8853cd1bb/peerj-05-3486-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e2f/5483034/a21e25e1100f/peerj-05-3486-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e2f/5483034/ef813efe2014/peerj-05-3486-g003.jpg

相似文献

1
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments.分治(DC)BLAST:在高性能计算(HPC)环境中快速轻松地执行BLAST。
PeerJ. 2017 Jun 22;5:e3486. doi: 10.7717/peerj.3486. eCollection 2017.
2
Massively Parallel Implementation of Sequence Alignment with Basic Local Alignment Search Tool Using Parallel Computing in Java Library.使用Java库中的并行计算通过基本局部比对搜索工具进行序列比对的大规模并行实现。
J Comput Biol. 2018 Aug;25(8):871-881. doi: 10.1089/cmb.2018.0079. Epub 2018 Jul 13.
3
Profiling the BLAST bioinformatics application for load balancing on high-performance computing clusters.剖析 BLAST 生物信息学应用在高性能计算集群中的负载均衡。
BMC Bioinformatics. 2022 Dec 16;23(1):544. doi: 10.1186/s12859-022-05029-7.
4
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.SS-Wrapper:用于在Linux集群上进行相似性搜索的一组包装应用程序。
BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.
5
Scaling bioinformatics applications on HPC.生物信息学应用在高性能计算上的扩展。
BMC Bioinformatics. 2017 Dec 28;18(Suppl 14):501. doi: 10.1186/s12859-017-1902-7.
6
G-BLASTN: accelerating nucleotide alignment by graphics processors.G-BLASTN:通过图形处理器加速核苷酸比对。
Bioinformatics. 2014 May 15;30(10):1384-91. doi: 10.1093/bioinformatics/btu047. Epub 2014 Jan 24.
7
Performance modelling of parallel BLAST using Intel and PGI compilers on an infiniband-based HPC cluster.在基于InfiniBand的高性能计算集群上使用英特尔和PGI编译器对并行BLAST进行性能建模。
Int J Bioinform Res Appl. 2013;9(5):534-46. doi: 10.1504/IJBRA.2013.056086.
8
A simple grid implementation with Berkeley Open Infrastructure for Network Computing using BLAST as a model.以伯克利开放式网络计算基础设施(BOINC)为基础,以BLAST为模型的简单网格实现。
PeerJ. 2016 Jul 28;4:e2248. doi: 10.7717/peerj.2248. eCollection 2016.
9
Using the Basic Local Alignment Search Tool (BLAST).使用基本局部比对搜索工具(BLAST)。
CSH Protoc. 2007 Jul 1;2007:pdb.top17. doi: 10.1101/pdb.top17.
10
HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.HBLAST:并行化序列相似性——一种可通过Hadoop进行MapReduce的基本局部比对搜索工具。
J Biomed Inform. 2015 Apr;54:58-64. doi: 10.1016/j.jbi.2015.01.008. Epub 2015 Jan 24.

引用本文的文献

1
Identification of non-model mammal species using the MinION DNA sequencer from Oxford Nanopore.使用牛津纳米孔公司的 MinION DNA 测序仪鉴定非模式哺乳动物物种。
PeerJ. 2024 Sep 25;12:e17887. doi: 10.7717/peerj.17887. eCollection 2024.
2
HPC-T-Annotator: an HPC tool for de novo transcriptome assembly annotation.HPC-T-Annotator:用于从头转录组组装注释的 HPC 工具。
BMC Bioinformatics. 2024 Aug 21;25(1):272. doi: 10.1186/s12859-024-05887-3.
3
Profiling the BLAST bioinformatics application for load balancing on high-performance computing clusters.

本文引用的文献

1
SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads.SOAPdenovo-Trans:基于短 RNA-Seq 数据的 de novo 转录组组装。
Bioinformatics. 2014 Jun 15;30(12):1660-6. doi: 10.1093/bioinformatics/btu077. Epub 2014 Feb 13.
2
De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.利用 Trinity 平台从 RNA-seq 进行从头转录序列重建,用于参考生成和分析。
Nat Protoc. 2013 Aug;8(8):1494-512. doi: 10.1038/nprot.2013.084. Epub 2013 Jul 11.
3
ScalaBLAST 2.0: rapid and robust BLAST calculations on multiprocessor systems.
剖析 BLAST 生物信息学应用在高性能计算集群中的负载均衡。
BMC Bioinformatics. 2022 Dec 16;23(1):544. doi: 10.1186/s12859-022-05029-7.
4
The final piece of the Triangle of U: Evolution of the tetraploid Brassica carinata genome.三角的最后一块:四倍体油菜基因组的进化。
Plant Cell. 2022 Oct 27;34(11):4143-4172. doi: 10.1093/plcell/koac249.
5
FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies.FastMLST:一种用于草图基因组组装多位点序列分型的多核工具。
Bioinform Biol Insights. 2021 Nov 27;15:11779322211059238. doi: 10.1177/11779322211059238. eCollection 2021.
6
Rapid mitochondrial genome sequencing based on Oxford Nanopore Sequencing and a proxy for vertebrate species identification.基于牛津纳米孔测序的快速线粒体基因组测序及脊椎动物物种鉴定的替代方法。
Ecol Evol. 2020 Mar 11;10(7):3544-3560. doi: 10.1002/ece3.6151. eCollection 2020 Apr.
7
MorphoCatcher: a multiple-alignment based web tool for target selection and designing taxon-specific primers in the loop-mediated isothermal amplification method.MorphoCatcher:一种基于多序列比对的网络工具,用于在环介导等温扩增法中进行靶标选择和设计特定分类群引物。
PeerJ. 2019 Apr 26;7:e6801. doi: 10.7717/peerj.6801. eCollection 2019.
ScalaBLAST 2.0:在多处理器系统上快速而强大的 BLAST 计算。
Bioinformatics. 2013 Mar 15;29(6):797-8. doi: 10.1093/bioinformatics/btt013. Epub 2013 Jan 29.
4
Update on activities at the Universal Protein Resource (UniProt) in 2013.2013 年 泛蛋白资源库(UniProt)活动更新。
Nucleic Acids Res. 2013 Jan;41(Database issue):D43-7. doi: 10.1093/nar/gks1068. Epub 2012 Nov 17.
5
SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.SPAdes:一种新的基因组组装算法及其在单细胞测序中的应用
J Comput Biol. 2012 May;19(5):455-77. doi: 10.1089/cmb.2012.0021. Epub 2012 Apr 16.
6
The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools.拟南芥信息资源(TAIR):改进的基因注释和新工具。
Nucleic Acids Res. 2012 Jan;40(Database issue):D1202-10. doi: 10.1093/nar/gkr1090. Epub 2011 Dec 2.
7
Accelerated Profile HMM Searches.加速轮廓隐马尔可夫模型搜索。
PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.
8
GPU-BLAST: using graphics processors to accelerate protein sequence alignment.GPU-BLAST:利用图形处理器加速蛋白质序列比对。
Bioinformatics. 2011 Jan 15;27(2):182-8. doi: 10.1093/bioinformatics/btq644. Epub 2010 Nov 18.
9
BLAST+: architecture and applications.BLAST+:体系结构与应用。
BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.
10
PLAST: parallel local alignment search tool for database comparison.PLAST:用于数据库比较的并行局部比对搜索工具。
BMC Bioinformatics. 2009 Oct 12;10:329. doi: 10.1186/1471-2105-10-329.