Suppr超能文献

SparkBLAST:使用内存操作的可扩展BLAST处理

SparkBLAST: scalable BLAST processing using in-memory operations.

作者信息

de Castro Marcelo Rodrigo, Tostes Catherine Dos Santos, Dávila Alberto M R, Senger Hermes, da Silva Fabricio A B

机构信息

Computer Science Department, Federal University of São Carlos, Rod. Washington Luís, Km 235, São Carlos, 21040-900, Brazil.

LBCS-IOC, Oswaldo Cruz Foundation, Av Brasil 4365, Rio de Janeiro, 21040-900, Brazil.

出版信息

BMC Bioinformatics. 2017 Jun 27;18(1):318. doi: 10.1186/s12859-017-1723-8.

Abstract

BACKGROUND

The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis.

RESULTS

Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times.

CONCLUSIONS

The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing.

摘要

背景

处理数量不断增加的基因组数据的需求给实现高度可扩展且高效的计算系统带来了新挑战。在本文中,我们提出了SparkBLAST,这是一种序列比对应用程序(BLAST)的并行化方案,它利用云计算来提供计算资源,并以Apache Spark作为协调框架。作为概念验证,我们选择了一些抗放射性核素的细菌基因组进行相似性分析。

结果

在谷歌云和微软Azure云中进行的实验表明,在加速比和执行时间方面,SparkBLAST优于在Hadoop上实现的等效系统。

结论

SparkBLAST的卓越性能主要归因于通过Spark框架实现的内存内操作,从而减少了分布式BLAST处理所需的本地I/O操作数量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ce7/5488373/e86bfc3d6402/12859_2017_1723_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验