• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CUDASW++4.0:基于 GPU 的超快 Smith-Waterman 蛋白质序列数据库搜索。

CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.

机构信息

Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany.

NVIDIA Corp., Santa Clara, USA.

出版信息

BMC Bioinformatics. 2024 Nov 2;25(1):342. doi: 10.1186/s12859-024-05965-6.

DOI:10.1186/s12859-024-05965-6
PMID:39488701
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11531700/
Abstract

BACKGROUND

The maximal sensitivity for local pairwise alignment makes the Smith-Waterman algorithm a popular choice for protein sequence database search. However, its quadratic time complexity makes it compute-intensive. Unfortunately, current state-of-the-art software tools are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. This motivates the need for more efficient implementations.

RESULTS

CUDASW++4.0 is a fast software tool for scanning protein sequence databases with the Smith-Waterman algorithm on CUDA-enabled GPUs. Our approach achieves high efficiency for dynamic programming-based alignment computation by minimizing memory accesses and instructions. We provide both efficient matrix tiling, and sequence database partitioning schemes, and exploit next generation floating point arithmetic and novel DPX instructions. This leads to close-to-peak performance on modern GPU generations (Ampere, Ada, Hopper) with throughput rates of up to 1.94 TCUPS, 5.01 TCUPS, 5.71 TCUPS on an A100, L40S, and H100, respectively. Evaluation on the Swiss-Prot, UniRef50, and TrEMBL databases shows that CUDASW++4.0 gains over an order-of-magnitude performance improvements over previous GPU-based approaches (CUDASW++3.0, ADEPT, SW#DB). In addition, our algorithm demonstrates significant speedups over top-performing CPU-based tools (BLASTP, SWIPE, SWIMM2.0), can exploit multi-GPU nodes with linear scaling, and features an impressive energy efficiency of up to 15.7 GCUPS/Watt.

CONCLUSION

CUDASW++4.0 changes the standing of GPUs in protein sequence database search with Smith-Waterman alignment by providing close-to-peak performance on modern GPUs. It is freely available at https://github.com/asbschmidt/CUDASW4 .

摘要

背景

局部两两比对的最大灵敏度使 Smith-Waterman 算法成为蛋白质序列数据库搜索的热门选择。然而,其二次时间复杂度使其计算密集。不幸的是,当前最先进的软件工具无法利用现代 GPU 的大规模并行处理能力接近峰值性能。这就需要更有效的实现。

结果

CUDASW++4.0 是一个快速的软件工具,用于在 CUDA 启用的 GPU 上使用 Smith-Waterman 算法扫描蛋白质序列数据库。我们的方法通过最小化内存访问和指令数量来实现基于动态规划的对齐计算的高效率。我们提供有效的矩阵平铺和序列数据库分区方案,并利用下一代浮点算术和新的 DPX 指令。这使得在现代 GPU 代(安培、Ada、Hopper)上实现接近峰值性能,吞吐量分别高达 1.94 TCUPS、5.01 TCUPS、5.71 TCUPS,在 A100、L40S 和 H100 上。在 Swiss-Prot、UniRef50 和 TrEMBL 数据库上的评估表明,CUDASW++4.0 在 GPU 基方法(CUDASW++3.0、ADEPT、SW#DB)上获得了一个数量级的性能提升。此外,我们的算法在 CPU 基工具(BLASTP、SWIPE、SWIMM2.0)上实现了显著的速度提升,可以利用多 GPU 节点进行线性扩展,并具有高达 15.7 GCUPS/Watt 的令人印象深刻的能效。

结论

CUDASW++4.0 通过在现代 GPU 上提供接近峰值的性能,改变了 GPU 在 Smith-Waterman 对齐的蛋白质序列数据库搜索中的地位。它可在 https://github.com/asbschmidt/CUDASW4 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/e2da6252de13/12859_2024_5965_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/039a3b0f29d5/12859_2024_5965_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/e5a8f63480a7/12859_2024_5965_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/ab79cb11fa94/12859_2024_5965_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/3f7d14d479e0/12859_2024_5965_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/5f0e05ade7af/12859_2024_5965_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/7bdb2b3b944e/12859_2024_5965_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/3151493d6e61/12859_2024_5965_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/7defc5344c05/12859_2024_5965_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/e2da6252de13/12859_2024_5965_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/039a3b0f29d5/12859_2024_5965_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/e5a8f63480a7/12859_2024_5965_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/ab79cb11fa94/12859_2024_5965_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/3f7d14d479e0/12859_2024_5965_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/5f0e05ade7af/12859_2024_5965_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/7bdb2b3b944e/12859_2024_5965_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/3151493d6e61/12859_2024_5965_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/7defc5344c05/12859_2024_5965_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f85/11531700/e2da6252de13/12859_2024_5965_Fig9_HTML.jpg

相似文献

1
CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.CUDASW++4.0:基于 GPU 的超快 Smith-Waterman 蛋白质序列数据库搜索。
BMC Bioinformatics. 2024 Nov 2;25(1):342. doi: 10.1186/s12859-024-05965-6.
2
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.CUDASW++ 3.0:通过结合 CPU 和 GPU 的 SIMD 指令来加速 Smith-Waterman 蛋白质数据库搜索。
BMC Bioinformatics. 2013 Apr 4;14:117. doi: 10.1186/1471-2105-14-117.
3
CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.CUDASW++2.0:基于单指令多线程(SIMT)和虚拟化单指令多数据(SIMD)抽象,在支持CUDA的图形处理器(GPU)上增强史密斯-沃特曼蛋白质数据库搜索功能。
BMC Res Notes. 2010 Apr 6;3:93. doi: 10.1186/1756-0500-3-93.
4
CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units.CUDASW++:针对支持CUDA的图形处理单元优化史密斯-沃特曼序列数据库搜索
BMC Res Notes. 2009 May 6;2:73. doi: 10.1186/1756-0500-2-73.
5
ADEPT: a domain independent sequence alignment strategy for gpu architectures.ADEPT:一种适用于 GPU 架构的与领域无关的序列比对策略。
BMC Bioinformatics. 2020 Sep 15;21(1):406. doi: 10.1186/s12859-020-03720-1.
6
Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.改进史密斯-沃特曼序列数据库搜索在支持CUDA的图形处理器上的映射。
Biomed Res Int. 2015;2015:185179. doi: 10.1155/2015/185179. Epub 2015 Aug 3.
7
GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda.GAMUT:通过CUDA-miRanda实现GPU加速的微小RNA分析以揭示靶基因
BMC Med Genomics. 2014;7 Suppl 1(Suppl 1):S9. doi: 10.1186/1755-8794-7-S1-S9. Epub 2014 May 8.
8
Pairwise sequence alignment for very long sequences on GPUs.在图形处理器(GPU)上对超长序列进行成对序列比对。
Int J Bioinform Res Appl. 2014;10(4-5):345-68. doi: 10.1504/IJBRA.2014.062989.
9
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs.在多个 GPU 上使用高效回溯例程的蛋白质比对算法。
BMC Bioinformatics. 2011 May 20;12:181. doi: 10.1186/1471-2105-12-181.
10
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment.CUDA兼容的GPU卡作为用于Smith-Waterman序列比对的高效硬件加速器。
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S10. doi: 10.1186/1471-2105-9-S2-S10.

引用本文的文献

1
GPU-accelerated homology search with MMseqs2.使用MMseqs2进行GPU加速的同源性搜索。
Nat Methods. 2025 Sep 18. doi: 10.1038/s41592-025-02819-8.
2
RabbitSketch: a high-performance sketching library for genome analysis.RabbitSketch:用于基因组分析的高性能草图绘制库。
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf249.
3
QuickEd: high-performance exact sequence alignment based on bound-and-align.QuickEd:基于绑定与比对的高性能精确序列比对

本文引用的文献

1
BSAlign: A Library for Nucleotide Sequence Alignment.BSAlign:一个核苷酸序列比对库。
Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae025.
2
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.
3
RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.RabbitFX:适用于现代多核平台的 FASTA/Q 文件解析的高效框架。
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf112.
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2341-2348. doi: 10.1109/TCBB.2022.3219114. Epub 2023 Jun 5.
4
Proposal of Smith-Waterman algorithm on FPGA to accelerate the forward and backtracking steps.基于 FPGA 的 Smith-Waterman 算法加速前向和回溯步骤的提案。
PLoS One. 2022 Jun 30;17(6):e0254736. doi: 10.1371/journal.pone.0254736. eCollection 2022.
5
ADEPT: a domain independent sequence alignment strategy for gpu architectures.ADEPT:一种适用于 GPU 架构的与领域无关的序列比对策略。
BMC Bioinformatics. 2020 Sep 15;21(1):406. doi: 10.1186/s12859-020-03720-1.
6
GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data.GASAL2:一个用于高通量 NGS 数据的 GPU 加速序列比对库。
BMC Bioinformatics. 2019 Oct 25;20(1):520. doi: 10.1186/s12859-019-3086-9.
7
SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences.SWIFOLD:基于OpenCL在FPGA上实现的用于长DNA序列的史密斯-沃特曼算法
BMC Syst Biol. 2018 Nov 20;12(Suppl 5):96. doi: 10.1186/s12918-018-0614-6.
8
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.使用矢量化和多线程在 SeqAn 中进行通用加速序列比对。
Bioinformatics. 2018 Oct 15;34(20):3437-3445. doi: 10.1093/bioinformatics/bty380.
9
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.MMseqs2支持进行灵敏的蛋白质序列搜索,以分析海量数据集。
Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988. Epub 2017 Oct 16.
10
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.MSAProbs-MPI:用于分布式内存系统的并行多序列比对工具
Bioinformatics. 2016 Dec 15;32(24):3826-3828. doi: 10.1093/bioinformatics/btw558. Epub 2016 Sep 16.