CUDASW++4.0：基于 GPU 的超快 Smith-Waterman 蛋白质序列数据库搜索。

CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.

机构信息

Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany.

NVIDIA Corp., Santa Clara, USA.

出版信息

BMC Bioinformatics. 2024 Nov 2;25(1):342. doi: 10.1186/s12859-024-05965-6.

DOI:10.1186/s12859-024-05965-6

PMID:39488701

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11531700/

Abstract

BACKGROUND

The maximal sensitivity for local pairwise alignment makes the Smith-Waterman algorithm a popular choice for protein sequence database search. However, its quadratic time complexity makes it compute-intensive. Unfortunately, current state-of-the-art software tools are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. This motivates the need for more efficient implementations.

RESULTS

CUDASW++4.0 is a fast software tool for scanning protein sequence databases with the Smith-Waterman algorithm on CUDA-enabled GPUs. Our approach achieves high efficiency for dynamic programming-based alignment computation by minimizing memory accesses and instructions. We provide both efficient matrix tiling, and sequence database partitioning schemes, and exploit next generation floating point arithmetic and novel DPX instructions. This leads to close-to-peak performance on modern GPU generations (Ampere, Ada, Hopper) with throughput rates of up to 1.94 TCUPS, 5.01 TCUPS, 5.71 TCUPS on an A100, L40S, and H100, respectively. Evaluation on the Swiss-Prot, UniRef50, and TrEMBL databases shows that CUDASW++4.0 gains over an order-of-magnitude performance improvements over previous GPU-based approaches (CUDASW++3.0, ADEPT, SW#DB). In addition, our algorithm demonstrates significant speedups over top-performing CPU-based tools (BLASTP, SWIPE, SWIMM2.0), can exploit multi-GPU nodes with linear scaling, and features an impressive energy efficiency of up to 15.7 GCUPS/Watt.

CONCLUSION

CUDASW++4.0 changes the standing of GPUs in protein sequence database search with Smith-Waterman alignment by providing close-to-peak performance on modern GPUs. It is freely available at https://github.com/asbschmidt/CUDASW4 .

摘要

背景

局部两两比对的最大灵敏度使 Smith-Waterman 算法成为蛋白质序列数据库搜索的热门选择。然而，其二次时间复杂度使其计算密集。不幸的是，当前最先进的软件工具无法利用现代 GPU 的大规模并行处理能力接近峰值性能。这就需要更有效的实现。

结果

CUDASW++4.0 是一个快速的软件工具，用于在 CUDA 启用的 GPU 上使用 Smith-Waterman 算法扫描蛋白质序列数据库。我们的方法通过最小化内存访问和指令数量来实现基于动态规划的对齐计算的高效率。我们提供有效的矩阵平铺和序列数据库分区方案，并利用下一代浮点算术和新的 DPX 指令。这使得在现代 GPU 代（安培、Ada、Hopper）上实现接近峰值性能，吞吐量分别高达 1.94 TCUPS、5.01 TCUPS、5.71 TCUPS，在 A100、L40S 和 H100 上。在 Swiss-Prot、UniRef50 和 TrEMBL 数据库上的评估表明，CUDASW++4.0 在 GPU 基方法（CUDASW++3.0、ADEPT、SW#DB）上获得了一个数量级的性能提升。此外，我们的算法在 CPU 基工具（BLASTP、SWIPE、SWIMM2.0）上实现了显著的速度提升，可以利用多 GPU 节点进行线性扩展，并具有高达 15.7 GCUPS/Watt 的令人印象深刻的能效。