• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ADEPT:一种适用于 GPU 架构的与领域无关的序列比对策略。

ADEPT: a domain independent sequence alignment strategy for gpu architectures.

机构信息

Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA.

出版信息

BMC Bioinformatics. 2020 Sep 15;21(1):406. doi: 10.1186/s12859-020-03720-1.

DOI:10.1186/s12859-020-03720-1
PMID:32933482
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7493400/
Abstract

BACKGROUND

Bioinformatic workflows frequently make use of automated genome assembly and protein clustering tools. At the core of most of these tools, a significant portion of execution time is spent in determining optimal local alignment between two sequences. This task is performed with the Smith-Waterman algorithm, which is a dynamic programming based method. With the advent of modern sequencing technologies and increasing size of both genome and protein databases, a need for faster Smith-Waterman implementations has emerged. Multiple SIMD strategies for the Smith-Waterman algorithm are available for CPUs. However, with the move of HPC facilities towards accelerator based architectures, a need for an efficient GPU accelerated strategy has emerged. Existing GPU based strategies have either been optimized for a specific type of characters (Nucleotides or Amino Acids) or for only a handful of application use-cases.

RESULTS

In this paper, we present ADEPT, a new sequence alignment strategy for GPU architectures that is domain independent, supporting alignment of sequences from both genomes and proteins. Our proposed strategy uses GPU specific optimizations that do not rely on the nature of sequence. We demonstrate the feasibility of this strategy by implementing the Smith-Waterman algorithm and comparing it to similar CPU strategies as well as the fastest known GPU methods for each domain. ADEPT's driver enables it to scale across multiple GPUs and allows easy integration into software pipelines which utilize large scale computational systems. We have shown that the ADEPT based Smith-Waterman algorithm demonstrates a peak performance of 360 GCUPS and 497 GCUPs for protein based and DNA based datasets respectively on a single GPU node (8 GPUs) of the Cori Supercomputer. Overall ADEPT shows 10x faster performance in a node-to-node comparison against a corresponding SIMD CPU implementation.

CONCLUSIONS

ADEPT demonstrates a performance that is either comparable or better than existing GPU strategies. We demonstrated the efficacy of ADEPT in supporting existing bionformatics software pipelines by integrating ADEPT in MetaHipMer a high-performance denovo metagenome assembler and PASTIS a high-performance protein similarity graph construction pipeline. Our results show 10% and 30% boost of performance in MetaHipMer and PASTIS respectively.

摘要

背景

生物信息学工作流经常使用自动化基因组组装和蛋白质聚类工具。在这些工具的核心部分,大部分执行时间都花在确定两个序列之间的最佳局部比对上。这项任务是使用 Smith-Waterman 算法完成的,这是一种基于动态规划的方法。随着现代测序技术的出现和基因组与蛋白质数据库的不断增大,对更快的 Smith-Waterman 实现的需求也随之产生。CPU 有多种 SIMD 策略可用于 Smith-Waterman 算法。然而,随着 HPC 设施向基于加速器的架构转移,对高效 GPU 加速策略的需求也出现了。现有的基于 GPU 的策略要么针对特定类型的字符(核苷酸或氨基酸)进行了优化,要么仅针对少数应用用例进行了优化。

结果

在本文中,我们提出了 ADEPT,这是一种针对 GPU 架构的新的序列对齐策略,它与领域无关,支持来自基因组和蛋白质的序列对齐。我们提出的策略使用了 GPU 特定的优化,不依赖于序列的性质。我们通过实现 Smith-Waterman 算法并将其与类似的 CPU 策略以及每个领域最快的已知 GPU 方法进行比较,证明了这种策略的可行性。ADEPT 的驱动程序使其能够在多个 GPU 上扩展,并允许轻松集成到利用大规模计算系统的软件管道中。我们已经表明,基于 ADEPT 的 Smith-Waterman 算法在单个 GPU 节点(Cori 超级计算机的 8 个 GPU)上分别针对基于蛋白质的和基于 DNA 的数据集实现了 360 GCUPS 和 497 GCUP 的峰值性能。总体而言,与相应的 SIMD CPU 实现相比,在节点到节点的比较中,ADEPT 的性能快了 10 倍。

结论

ADEPT 的性能与现有 GPU 策略相当或更好。我们通过将 ADEPT 集成到高性能 de novo 宏基因组组装器 MetaHipMer 和高性能蛋白质相似性图构建管道 PASTIS 中,证明了 ADEPT 支持现有的生物信息学软件管道的有效性。我们的结果分别显示 MetaHipMer 和 PASTIS 的性能提升了 10%和 30%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/54a8226a1f32/12859_2020_3720_Fig21_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/e18897554709/12859_2020_3720_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/341295a7dd29/12859_2020_3720_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/d687d2110ad0/12859_2020_3720_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/ce192e8456aa/12859_2020_3720_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/73956ea8a3ce/12859_2020_3720_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/5a58d839a0db/12859_2020_3720_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/88a2b910f92e/12859_2020_3720_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/db9c9993d701/12859_2020_3720_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/6e7e8e0bb908/12859_2020_3720_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/8ee4e0886dbc/12859_2020_3720_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/fe20f11e1106/12859_2020_3720_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/9225e4ff7262/12859_2020_3720_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/8da6f0071340/12859_2020_3720_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/2064103f8605/12859_2020_3720_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/5c768cdce118/12859_2020_3720_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/ae9d6a039676/12859_2020_3720_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/5216e64bebf1/12859_2020_3720_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/1f937203c143/12859_2020_3720_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/ca2f6b1a206b/12859_2020_3720_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/c37de9ab8c5e/12859_2020_3720_Fig20_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/54a8226a1f32/12859_2020_3720_Fig21_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/e18897554709/12859_2020_3720_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/341295a7dd29/12859_2020_3720_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/d687d2110ad0/12859_2020_3720_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/ce192e8456aa/12859_2020_3720_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/73956ea8a3ce/12859_2020_3720_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/5a58d839a0db/12859_2020_3720_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/88a2b910f92e/12859_2020_3720_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/db9c9993d701/12859_2020_3720_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/6e7e8e0bb908/12859_2020_3720_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/8ee4e0886dbc/12859_2020_3720_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/fe20f11e1106/12859_2020_3720_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/9225e4ff7262/12859_2020_3720_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/8da6f0071340/12859_2020_3720_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/2064103f8605/12859_2020_3720_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/5c768cdce118/12859_2020_3720_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/ae9d6a039676/12859_2020_3720_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/5216e64bebf1/12859_2020_3720_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/1f937203c143/12859_2020_3720_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/ca2f6b1a206b/12859_2020_3720_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/c37de9ab8c5e/12859_2020_3720_Fig20_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f1e/7493400/54a8226a1f32/12859_2020_3720_Fig21_HTML.jpg

相似文献

1
ADEPT: a domain independent sequence alignment strategy for gpu architectures.ADEPT:一种适用于 GPU 架构的与领域无关的序列比对策略。
BMC Bioinformatics. 2020 Sep 15;21(1):406. doi: 10.1186/s12859-020-03720-1.
2
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.CUDASW++ 3.0:通过结合 CPU 和 GPU 的 SIMD 指令来加速 Smith-Waterman 蛋白质数据库搜索。
BMC Bioinformatics. 2013 Apr 4;14:117. doi: 10.1186/1471-2105-14-117.
3
Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel.将 SIMD 和 SIMT 架构进行耦合以提高具有系统发育感知的对齐核的性能。
BMC Bioinformatics. 2012 Aug 9;13:196. doi: 10.1186/1471-2105-13-196.
4
SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences.SWIFOLD:基于OpenCL在FPGA上实现的用于长DNA序列的史密斯-沃特曼算法
BMC Syst Biol. 2018 Nov 20;12(Suppl 5):96. doi: 10.1186/s12918-018-0614-6.
5
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment.CUDA兼容的GPU卡作为用于Smith-Waterman序列比对的高效硬件加速器。
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S10. doi: 10.1186/1471-2105-9-S2-S10.
6
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.基于至强融核集群的大规模生物序列比对并行算法
BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):267. doi: 10.1186/s12859-016-1128-0.
7
Pairwise sequence alignment for very long sequences on GPUs.在图形处理器(GPU)上对超长序列进行成对序列比对。
Int J Bioinform Res Appl. 2014;10(4-5):345-68. doi: 10.1504/IJBRA.2014.062989.
8
Accelerating the Smith-Waterman algorithm with interpair pruning and band optimization for the all-pairs comparison of base sequences.通过配对间剪枝和带优化加速史密斯-沃特曼算法以进行碱基序列的全配对比较。
BMC Bioinformatics. 2015 Oct 6;16:321. doi: 10.1186/s12859-015-0744-4.
9
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs.在多个 GPU 上使用高效回溯例程的蛋白质比对算法。
BMC Bioinformatics. 2011 May 20;12:181. doi: 10.1186/1471-2105-12-181.
10
Speeding-up Bioinformatics Algorithms with Heterogeneous Architectures: Highly Heterogeneous Smith-Waterman (HHeterSW).利用异构架构加速生物信息学算法:高度异构的史密斯-沃特曼算法(HHeterSW)
J Comput Biol. 2016 Oct;23(10):801-9. doi: 10.1089/cmb.2015.0237. Epub 2016 Apr 22.

引用本文的文献

1
CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.CUDASW++4.0:基于 GPU 的超快 Smith-Waterman 蛋白质序列数据库搜索。
BMC Bioinformatics. 2024 Nov 2;25(1):342. doi: 10.1186/s12859-024-05965-6.
2
WFA-GPU: gap-affine pairwise read-alignment using GPUs.WFA-GPU:基于 GPU 的缺口仿射两两序列比对
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad701.
3
GPU-acceleration of the distributed-memory database peptide search of mass spectrometry data.GPU 加速质谱数据分布式内存数据库肽搜索。

本文引用的文献

1
Terabase-scale metagenome coassembly with MetaHipMer.万亿级基因组组装规模的宏基因组 coassembly 与 MetaHipMer。
Sci Rep. 2020 Jul 1;10(1):10689. doi: 10.1038/s41598-020-67416-5.
2
GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data.GASAL2:一个用于高通量 NGS 数据的 GPU 加速序列比对库。
BMC Bioinformatics. 2019 Oct 25;20(1):520. doi: 10.1186/s12859-019-3086-9.
3
GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data.GPU-DAEMON:基于数组的大型组学数据的 GPU 算法设计、数据管理和优化模板。
Sci Rep. 2023 Oct 31;13(1):18713. doi: 10.1038/s41598-023-43033-w.
4
Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs, and ASICs.Scrooge:一种用于 CPU、GPU 和 ASIC 的快速且节省内存的基因组序列比对器。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad151.
5
MRI-based brain tumor segmentation using FPGA-accelerated neural network.基于 MRI 的脑肿瘤分割的 FPGA 加速神经网络方法。
BMC Bioinformatics. 2021 Sep 7;22(1):421. doi: 10.1186/s12859-021-04347-6.
6
A Review of Parallel Implementations for the Smith-Waterman Algorithm.《Smith-Waterman 算法的并行实现综述》。
Interdiscip Sci. 2022 Mar;14(1):1-14. doi: 10.1007/s12539-021-00473-0. Epub 2021 Sep 6.
Comput Biol Med. 2018 Oct 1;101:163-173. doi: 10.1016/j.compbiomed.2018.08.015. Epub 2018 Aug 16.
4
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.使用矢量化和多线程在 SeqAn 中进行通用加速序列比对。
Bioinformatics. 2018 Oct 15;34(20):3437-3445. doi: 10.1093/bioinformatics/bty380.
5
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks.HipMCL:一种用于大规模网络的马尔可夫聚类算法的高性能并行实现。
Nucleic Acids Res. 2018 Apr 6;46(6):e33. doi: 10.1093/nar/gkx1313.
6
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.MMseqs2支持进行灵敏的蛋白质序列搜索,以分析海量数据集。
Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988. Epub 2017 Oct 16.
7
The SeqAn C++ template library for efficient sequence analysis: A resource for programmers.SeqAn C++ 模板库用于高效的序列分析:面向程序员的资源。
J Biotechnol. 2017 Nov 10;261:157-168. doi: 10.1016/j.jbiotec.2017.07.017. Epub 2017 Sep 6.
8
Fast and sensitive protein alignment using DIAMOND.使用 DIAMOND 进行快速灵敏的蛋白质比对。
Nat Methods. 2015 Jan;12(1):59-60. doi: 10.1038/nmeth.3176. Epub 2014 Nov 17.
9
Selecting the Right Similarity-Scoring Matrix.选择合适的相似性评分矩阵。
Curr Protoc Bioinformatics. 2013;43:3.5.1-3.5.9. doi: 10.1002/0471250953.bi0305s43.
10
SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications.SSW 库:一个用于基因组应用的 SIMD Smith-Waterman C/C++ 库。
PLoS One. 2013 Dec 4;8(12):e82138. doi: 10.1371/journal.pone.0082138. eCollection 2013.