• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将 SIMD 和 SIMT 架构进行耦合以提高具有系统发育感知的对齐核的性能。

Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel.

机构信息

The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Germany.

出版信息

BMC Bioinformatics. 2012 Aug 9;13:196. doi: 10.1186/1471-2105-13-196.

DOI:10.1186/1471-2105-13-196
PMID:22876807
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3496624/
Abstract

BACKGROUND

Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles that correspond to the branches of such a reference tree. The algorithm needs to perform an immense number of pairwise alignments. Therefore, we explore vector intrinsics and GPUs to accelerate the PaPaRa alignment kernel.

RESULTS

We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (Single Instruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, we obtained a 9-fold acceleration on a single core as well as linear speedups with respect to the number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga Cell Updates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4 GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. We also used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, Multiple Threads) architecture. A NVIDIA GeForce 560 GPU delivered peak and average performance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD and SIMT implementations into a hybrid CPU-GPU system that achieved an accumulated peak performance of 33.8 GCUPS.

CONCLUSIONS

This accelerated version of PaPaRa (available at http://www.exelixis-lab.org/software.html) provides a significant performance improvement that allows for analyzing larger datasets in less time. We observe that state-of-the-art SIMD and SIMT architectures deliver comparable performance for this dynamic programming kernel when the "competing programmer approach" is deployed. Finally, we show that overall performance can be substantially increased by designing a hybrid CPU-GPU system with appropriate load distribution mechanisms.

摘要

背景

将短 DNA 读取与参考序列比对是检测其生物起源并在系统发生背景下分析它们的前提。使用 PaPaRa 工具,我们引入了一种专门的动态规划算法,用于同时将短读取与参考比对和相应的进化参考树进行比对。该算法将短读取与对应于参考树分支的系统发生分布进行比对。该算法需要执行大量的两两比对。因此,我们探索了矢量内在函数和 GPU 来加速 PaPaRa 比对核心。

结果

我们在 CPU 和 GPU 上对 PaPaRa 进行了优化和并行化。通过 x86 SIMD(单指令,多数据)内在函数和多线程,我们在单个核心上获得了 9 倍的加速,并且相对于核心数量具有线性加速。在一个运行频率为 3.4GHz 的 Intel i7 2600 CPU 上,使用所有四个物理核心,峰值 CPU 性能达到 18.1 GCUPS(每秒十亿个细胞更新)。平均 CPU 性能(在所有测试运行中平均)为 12.33 GCUPS。我们还使用 OpenCL 在 GPU SIMT(单指令,多线程)架构上执行 PaPaRa。NVIDIA GeForce 560 GPU 提供了峰值和平均性能,分别为 22.1 和 18.4 GCUPS。最后,我们将 SIMD 和 SIMT 实现组合到一个混合 CPU-GPU 系统中,实现了 33.8 GCUPS 的累积峰值性能。

结论

此加速版本的 PaPaRa(可在 http://www.exelixis-lab.org/software.html 上获得)提供了显著的性能提升,允许在更短的时间内分析更大的数据集。我们观察到,当采用“竞争程序员方法”时,最先进的 SIMD 和 SIMT 架构为这个动态规划内核提供了相当的性能。最后,我们表明,通过设计具有适当负载分配机制的混合 CPU-GPU 系统,可以大大提高整体性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/bf3ef423e208/1471-2105-13-196-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/36d294c0c972/1471-2105-13-196-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/598b8e89baaf/1471-2105-13-196-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/478c400a4fda/1471-2105-13-196-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/eaa831ee8bf0/1471-2105-13-196-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/bf3ef423e208/1471-2105-13-196-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/36d294c0c972/1471-2105-13-196-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/598b8e89baaf/1471-2105-13-196-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/478c400a4fda/1471-2105-13-196-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/eaa831ee8bf0/1471-2105-13-196-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e965/3496624/bf3ef423e208/1471-2105-13-196-5.jpg

相似文献

1
Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel.将 SIMD 和 SIMT 架构进行耦合以提高具有系统发育感知的对齐核的性能。
BMC Bioinformatics. 2012 Aug 9;13:196. doi: 10.1186/1471-2105-13-196.
2
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.CUDASW++ 3.0:通过结合 CPU 和 GPU 的 SIMD 指令来加速 Smith-Waterman 蛋白质数据库搜索。
BMC Bioinformatics. 2013 Apr 4;14:117. doi: 10.1186/1471-2105-14-117.
3
ADEPT: a domain independent sequence alignment strategy for gpu architectures.ADEPT:一种适用于 GPU 架构的与领域无关的序列比对策略。
BMC Bioinformatics. 2020 Sep 15;21(1):406. doi: 10.1186/s12859-020-03720-1.
4
CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.CUDAMPF:一种用于在支持CUDA的GPU上加速HMMER中蛋白质序列搜索的多层并行框架。
BMC Bioinformatics. 2016 Feb 27;17:106. doi: 10.1186/s12859-016-0946-4.
5
CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.CUDASW++2.0:基于单指令多线程(SIMT)和虚拟化单指令多数据(SIMD)抽象,在支持CUDA的图形处理器(GPU)上增强史密斯-沃特曼蛋白质数据库搜索功能。
BMC Res Notes. 2010 Apr 6;3:93. doi: 10.1186/1756-0500-3-93.
6
libgapmis: extending short-read alignments.libgapmis:扩展短读序列比对。
BMC Bioinformatics. 2013;14 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-14-S11-S4. Epub 2013 Nov 4.
7
Aligning short reads to reference alignments and trees.将短读段比对到参考比对和树。
Bioinformatics. 2011 Aug 1;27(15):2068-75. doi: 10.1093/bioinformatics/btr320. Epub 2011 Jun 2.
8
GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data.GASAL2:一个用于高通量 NGS 数据的 GPU 加速序列比对库。
BMC Bioinformatics. 2019 Oct 25;20(1):520. doi: 10.1186/s12859-019-3086-9.
9
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment.CUDA兼容的GPU卡作为用于Smith-Waterman序列比对的高效硬件加速器。
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S10. doi: 10.1186/1471-2105-9-S2-S10.
10
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail:用于全局、半全局和局部成对序列比对的SIMD C库。
BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.

引用本文的文献

1
Fast noisy long read alignment with multi-level parallelism.基于多级并行的快速噪声长读比对
BMC Bioinformatics. 2025 May 2;26(1):118. doi: 10.1186/s12859-025-06129-w.
2
libgapmis: extending short-read alignments.libgapmis:扩展短读序列比对。
BMC Bioinformatics. 2013;14 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-14-S11-S4. Epub 2013 Nov 4.
3
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.CUDASW++ 3.0:通过结合 CPU 和 GPU 的 SIMD 指令来加速 Smith-Waterman 蛋白质数据库搜索。

本文引用的文献

1
SEPP: SATé-enabled phylogenetic placement.SEPP:基于SATé的系统发育定位
Pac Symp Biocomput. 2012:247-58. doi: 10.1142/9789814366496_0024.
2
Aligning short reads to reference alignments and trees.将短读段比对到参考比对和树。
Bioinformatics. 2011 Aug 1;27(15):2068-75. doi: 10.1093/bioinformatics/btr320. Epub 2011 Jun 2.
3
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.利用序列间 SIMD 并行化实现更快的 Smith-Waterman 数据库搜索。
BMC Bioinformatics. 2013 Apr 4;14:117. doi: 10.1186/1471-2105-14-117.
4
A hybrid short read mapping accelerator.一种混合短读映射加速器。
BMC Bioinformatics. 2013 Feb 26;14:67. doi: 10.1186/1471-2105-14-67.
BMC Bioinformatics. 2011 Jun 1;12:221. doi: 10.1186/1471-2105-12-221.
4
CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.CUDASW++2.0:基于单指令多线程(SIMT)和虚拟化单指令多数据(SIMD)抽象,在支持CUDA的图形处理器(GPU)上增强史密斯-沃特曼蛋白质数据库搜索功能。
BMC Res Notes. 2010 Apr 6;3:93. doi: 10.1186/1756-0500-3-93.
5
CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units.CUDASW++:针对支持CUDA的图形处理单元优化史密斯-沃特曼序列数据库搜索
BMC Res Notes. 2009 May 6;2:73. doi: 10.1186/1756-0500-2-73.
6
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment.CUDA兼容的GPU卡作为用于Smith-Waterman序列比对的高效硬件加速器。
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S10. doi: 10.1186/1471-2105-9-S2-S10.
7
Striped Smith-Waterman speeds database searches six times over other SIMD implementations.条纹史密斯-沃特曼算法在数据库搜索速度上比其他单指令多数据(SIMD)实现快六倍。
Bioinformatics. 2007 Jan 15;23(2):156-61. doi: 10.1093/bioinformatics/btl582. Epub 2006 Nov 16.
8
MAFFT version 5: improvement in accuracy of multiple sequence alignment.MAFFT 5 版本:多重序列比对准确性的提升。
Nucleic Acids Res. 2005 Jan 20;33(2):511-8. doi: 10.1093/nar/gki198. Print 2005.
9
MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE:具有高精度和高吞吐量的多序列比对。
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
10
Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors.使用普通微处理器上的并行处理技术,将史密斯-沃特曼序列数据库搜索速度提高六倍。
Bioinformatics. 2000 Aug;16(8):699-706. doi: 10.1093/bioinformatics/16.8.699.