Suppr超能文献

muBLASTP:基于多核CPU的数据库索引蛋白质序列搜索。

muBLASTP: database-indexed protein sequence search on multicore CPUs.

作者信息

Zhang Jing, Misra Sanchit, Wang Hao, Feng Wu-Chun

机构信息

Department of Computer Science, Virginia Tech, 225 Stanger Street, Blacksburg, 24060, VA, USA.

Parallel Computing Lab, Intel Corporation, Bengaluru, Karnataka, 560102, India.

出版信息

BMC Bioinformatics. 2016 Nov 4;17(1):443. doi: 10.1186/s12859-016-1302-4.

Abstract

BACKGROUND

The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search.

RESULTS

muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST.

CONCLUSIONS

With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.

摘要

背景

基本局部比对搜索工具(BLAST)是生命科学中的一个基础程序,用于在数据库中搜索与查询序列最相似的序列。目前,BLAST算法采用查询索引方法。尽管许多方法表明使用数据库索引进行序列搜索可以实现更高的吞吐量(例如BLAT、SSAHA和CAFE),但它们无法提供与查询索引的BLAST(即NCBI BLAST)相同水平的灵敏度,或者它们仅支持核苷酸序列搜索,例如MegaBLAST。由于查询索引和数据库索引之间存在不同的挑战和特性,现有的查询索引搜索技术无法用于数据库索引搜索。

结果

muBLASTP是一种用于蛋白质序列搜索的新型数据库索引BLAST,它返回的命中结果与NCBI BLAST相同。在英特尔至强多核CPU上,对于单个查询,单线程的muBLASTP在比对阶段的加速比高达4.41倍,端到端加速比相对于单线程的NCBI BLAST高达1.75倍。对于一批查询,多线程的muBLASTP在比对阶段的加速比高达5.7倍,端到端加速比相对于多线程的NCBI BLAST高达4.56倍。

结论

通过为蛋白质数据库新设计的索引结构以及在BLASTP算法中的相关优化,我们为现代多核处理器重新设计了BLASTP算法,该算法在数据库索引占用可接受内存的情况下实现了更高的吞吐量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec67/5096327/e2467c105bab/12859_2016_1302_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验