CUDAMPF：一种用于在支持CUDA的GPU上加速HMMER中蛋白质序列搜索的多层并行框架。

CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.

作者信息

Jiang Hanyu, Ganesan Narayan

机构信息

Department of Elec. and Comp. Engg, Stevens Institute of Technology, Hoboken, NJ, 07030, USA.

出版信息

BMC Bioinformatics. 2016 Feb 27;17:106. doi: 10.1186/s12859-016-0946-4.

DOI:10.1186/s12859-016-0946-4

PMID:26920848

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4769571/

Abstract

BACKGROUND

HMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors.

RESULTS

A Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance.

CONCLUSIONS

CUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other acceleration attempts, comprehensive evaluations against high-end CPUs (Intel i5, i7 and Xeon) shows that CUDAMPF yields upto 440 GCUPS for SSV, 277 GCUPS for MSV and 14.3 GCUPS for P7Viterbi all with 100 % accuracy, which translates to a maximum speedup of 37.5, 23.1 and 11.6-fold for MSV, SSV and P7Viterbi respectively. The source code is available at https://github.com/Super-Hippo/CUDAMPF.

摘要

背景

HMMER软件套件被广泛用于高灵敏度地分析同源蛋白质和核苷酸序列。HMMER 3.x中最新版本的hmmsearch利用启发式管道，该管道由MSV/SSV（多个/单个无间隙片段维特比）阶段、P7维特比阶段和前向评分阶段组成，以加速同源性检测。由于最新版本针对具有SSE功能的现代多核CPU进行了高度优化，只有少数加速尝试报告了加速效果。然而，管道中计算量最大的任务（即MSV/SSV和P7维特比阶段）仍有望从大规模并行处理器的计算能力中受益。

结果

本文介绍的在支持CUDA的GPU上实现的多层并行框架（CUDAMPF），为MSV/SSV和维特比算法提供了更细粒度的并行性。我们将单指令多线程（SIMT）机制与单指令多数据（SIMD）视频指令以及线程束同步相结合，以实现高吞吐量处理并消除线程空闲。我们还提出了一种硬件感知的稀缺资源（如片上内存和缓存）优化分配方案，以提高CUDAMPF的性能和可扩展性。此外，通过CUDA 7.0提供的NVRTC进行的运行时编译被纳入所提出的框架中，这不仅有助于展开最内层循环，比静态编译产生高达2到3倍的加速，还能根据查询模型大小动态加载和切换内核，以实现最佳性能。

结论

CUDAMPF被设计为一种硬件感知的并行框架，用于加速hmmsearch管道内的计算热点以及其他序列比对应用。它通过利用单个GPU上的分层并行性实现了显著的加速，并根据自身性能特征充分利用有限资源。除了超过其他加速尝试的性能外，针对高端CPU（英特尔i5、i7和至强）的综合评估表明，CUDAMPF在SSV上产生高达440 GCUPS，在MSV上产生277 GCUPS，在P7维特比上产生14.3 GCUPS，所有这些都具有100%的准确率，这分别转化为MSV、SSV和P7维特比的最大加速比为37.5倍、23.1倍和11.6倍。源代码可在https://github.com/Super-Hippo/CUDAMPF获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acba/4769571/c210dc97080c/12859_2016_946_Fig1_HTML.jpg

相似文献

CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.CUDAMPF：一种用于在支持CUDA的GPU上加速HMMER中蛋白质序列搜索的多层并行框架。

BMC Bioinformatics. 2016 Feb 27;17:106. doi: 10.1186/s12859-016-0946-4.

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.基于至强融核集群的大规模生物序列比对并行算法

BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):267. doi: 10.1186/s12859-016-1128-0.

Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel.将 SIMD 和 SIMT 架构进行耦合以提高具有系统发育感知的对齐核的性能。

BMC Bioinformatics. 2012 Aug 9;13:196. doi: 10.1186/1471-2105-13-196.

CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.CUDASW++ 3.0：通过结合 CPU 和 GPU 的 SIMD 指令来加速 Smith-Waterman 蛋白质数据库搜索。

BMC Bioinformatics. 2013 Apr 4;14:117. doi: 10.1186/1471-2105-14-117.

Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware.MAFFT在支持CUDA的图形硬件上的并行实现。

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):205-18. doi: 10.1109/TCBB.2014.2351801.

CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.CUDASW++2.0：基于单指令多线程（SIMT）和虚拟化单指令多数据（SIMD）抽象，在支持CUDA的图形处理器（GPU）上增强史密斯-沃特曼蛋白质数据库搜索功能。

BMC Res Notes. 2010 Apr 6;3:93. doi: 10.1186/1756-0500-3-93.

GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda.GAMUT：通过CUDA-miRanda实现GPU加速的微小RNA分析以揭示靶基因

BMC Med Genomics. 2014;7 Suppl 1(Suppl 1):S9. doi: 10.1186/1755-8794-7-S1-S9. Epub 2014 May 8.

CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units.CUDASW++：针对支持CUDA的图形处理单元优化史密斯-沃特曼序列数据库搜索

BMC Res Notes. 2009 May 6;2:73. doi: 10.1186/1756-0500-2-73.

ADEPT: a domain independent sequence alignment strategy for gpu architectures.ADEPT：一种适用于 GPU 架构的与领域无关的序列比对策略。

BMC Bioinformatics. 2020 Sep 15;21(1):406. doi: 10.1186/s12859-020-03720-1.

NMF-mGPU: non-negative matrix factorization on multi-GPU systems.NMF-mGPU：多GPU系统上的非负矩阵分解

BMC Bioinformatics. 2015 Feb 13;16:43. doi: 10.1186/s12859-015-0485-4.

引用本文的文献

An FPGA-based hardware accelerator supporting sensitive sequence homology filtering with profile hidden Markov models.基于 FPGA 的硬件加速器，支持使用隐马尔可夫模型进行敏感序列同源性过滤。

BMC Bioinformatics. 2024 Jul 29;25(1):247. doi: 10.1186/s12859-024-05879-3.

GPU-accelerated and pipelined methylation calling.GPU加速和流水线式甲基化检测

Bioinform Adv. 2022 Nov 30;2(1):vbac088. doi: 10.1093/bioadv/vbac088. eCollection 2022.

Draft Genome Sequence of Bifidobacterium longum subsp. BI-G201, a Commercialization Strain.长双歧杆菌亚种BI-G201（一种商业化菌株）的基因组序列草图

Microbiol Resour Announc. 2020 Nov 19;9(47):e00785-20. doi: 10.1128/MRA.00785-20.

Expression, Localization of SUMO-1, and Analyses of Potential SUMOylated Proteins in Spermatozoa.精子中SUMO-1的表达、定位及潜在SUMO化蛋白分析

Front Physiol. 2017 Jun 13;8:354. doi: 10.3389/fphys.2017.00354. eCollection 2017.

Graphics processing units in bioinformatics, computational biology and systems biology.生物信息学、计算生物学和系统生物学中的图形处理单元

Brief Bioinform. 2017 Sep 1;18(5):870-885. doi: 10.1093/bib/bbw058.

本文引用的文献

Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware.MAFFT在支持CUDA的图形硬件上的并行实现。

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):205-18. doi: 10.1109/TCBB.2014.2351801.

CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs.CUDA ClustalW：一种用于在多图形处理器上进行渐进式多序列比对的高效并行算法。

Comput Biol Chem. 2015 Oct;58:62-8. doi: 10.1016/j.compbiolchem.2015.05.004. Epub 2015 May 21.

Modern Computational Techniques for the HMMER Sequence Analysis.用于HMMER序列分析的现代计算技术。

ISRN Bioinform. 2013 Sep 3;2013:252183. doi: 10.1155/2013/252183. eCollection 2013.

Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER.用于 HMMER 中序列搜索的无高速缓存感知并行 SIMD Viterbi 解码。

BMC Bioinformatics. 2014 May 30;15:165. doi: 10.1186/1471-2105-15-165.

Accelerated Profile HMM Searches.加速轮廓隐马尔可夫模型搜索。

PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.

A probabilistic model of local sequence alignment that simplifies statistical significance estimation.一种简化统计显著性估计的局部序列比对概率模型。

PLoS Comput Biol. 2008 May 30;4(5):e1000069. doi: 10.1371/journal.pcbi.1000069.

Striped Smith-Waterman speeds database searches six times over other SIMD implementations.条纹史密斯-沃特曼算法在数据库搜索速度上比其他单指令多数据（SIMD）实现快六倍。

Bioinformatics. 2007 Jan 15;23(2):156-61. doi: 10.1093/bioinformatics/btl582. Epub 2006 Nov 16.

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.MAFFT：一种基于快速傅里叶变换的快速多序列比对新方法。

Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436.

Profile hidden Markov models.轮廓隐马尔可夫模型

Bioinformatics. 1998;14(9):755-63. doi: 10.1093/bioinformatics/14.9.755.

Identification of common molecular subsequences.常见分子子序列的鉴定

J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CUDAMPF：一种用于在支持CUDA的GPU上加速HMMER中蛋白质序列搜索的多层并行框架。

CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献