• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BIMSA:使用内存中处理加速长序列比对。

BIMSA: accelerating long sequence alignment using processing-in-memory.

机构信息

Department of Computer Sciences, Barcelona Supercomputing Center, Barcelona 08034, Spain.

Department of Computer Science, Universitat Politècnica de Catalunya, Barcelona 08034, Spain.

出版信息

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae631.

DOI:10.1093/bioinformatics/btae631
PMID:39432682
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11576351/
Abstract

MOTIVATION

Recent advances in sequencing technologies have stressed the critical role of sequence analysis algorithms and tools in genomics and healthcare research. In particular, sequence alignment is a fundamental building block in many sequence analysis pipelines and is frequently a performance bottleneck both in terms of execution time and memory usage. Classical sequence alignment algorithms are based on dynamic programming and often require quadratic time and memory with respect to the sequence length. As a result, classical sequence alignment algorithms fail to scale with increasing sequence lengths and quickly become memory-bound due to data-movement penalties.

RESULTS

Processing-In-Memory (PIM) is an emerging architectural paradigm that seeks to accelerate memory-bound algorithms by bringing computation closer to the data to mitigate data-movement penalties. This work presents BIMSA (Bidirectional In-Memory Sequence Alignment), a PIM design and implementation for the state-of-the-art sequence alignment algorithm BiWFA (Bidirectional Wavefront Alignment), incorporating new hardware-aware optimizations for a production-ready PIM architecture (UPMEM). BIMSA supports aligning sequences up to 100K bases, exceeding the limitations of state-of-the-art PIM implementations. First, BIMSA achieves speedups up to 22.24× (11.95× on average) compared to state-of-the-art PIM-enabled implementations of sequence alignment algorithms. Second, achieves speedups up to 5.84× (2.83× on average) compared to the highest-performance multicore CPU implementation of BiWFA. Third, BIMSA exhibits linear scalability with the number of compute units in memory, enabling further performance improvements with upcoming PIM architectures equipped with more compute units and achieving speedups up to 9.56× (4.7× on average).

AVAILABILITY AND IMPLEMENTATION

Code and documentation are publicly available at https://github.com/AlejandroAMarin/BIMSA.

摘要

动机

测序技术的最新进展强调了序列分析算法和工具在基因组学和医疗保健研究中的关键作用。特别是,序列比对是许多序列分析管道的基本构建块,无论是在执行时间还是内存使用方面,通常都是性能瓶颈。经典的序列比对算法基于动态规划,通常需要与序列长度成二次方的时间和内存。因此,经典的序列比对算法无法随着序列长度的增加而扩展,并且由于数据移动开销很快就会受到内存的限制。

结果

处理内存在(PIM)是一种新兴的架构范例,旨在通过将计算更接近数据来加速受内存限制的算法,从而减轻数据移动开销。这项工作提出了 BIMSA(双向内存序列比对),这是一种针对最先进的序列比对算法 BiWFA(双向波前比对)的 PIM 设计和实现,为生产就绪的 PIM 架构(UPMEM)纳入了新的硬件感知优化。BIMSA 支持比对长达 100K 个碱基的序列,超过了最先进的 PIM 实现的限制。首先,与最先进的 PIM 启用的序列比对算法实现相比,BIMSA 实现了高达 22.24 倍(平均 11.95 倍)的加速。其次,与 BiWFA 的最高性能多核 CPU 实现相比,实现了高达 5.84 倍(平均 2.83 倍)的加速。第三,BIMSA 具有与内存中计算单元数量的线性可扩展性,使具有更多计算单元的即将推出的 PIM 架构能够进一步提高性能,并实现高达 9.56 倍(平均 4.7 倍)的加速。

可用性和实现

代码和文档可在 https://github.com/AlejandroAMarin/BIMSA 上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/0486aea5135d/btae631f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/953e6ad044b0/btae631f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/c59ba8fd1e31/btae631f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/3057e8bbefc9/btae631f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/7cbf3c51b4c9/btae631f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/0486aea5135d/btae631f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/953e6ad044b0/btae631f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/c59ba8fd1e31/btae631f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/3057e8bbefc9/btae631f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/7cbf3c51b4c9/btae631f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9763/11576351/0486aea5135d/btae631f5.jpg

相似文献

1
BIMSA: accelerating long sequence alignment using processing-in-memory.BIMSA:使用内存中处理加速长序列比对。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae631.
2
A framework for high-throughput sequence alignment using real processing-in-memory systems.基于真实处理内存储系统的高通量序列比对框架。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad155.
3
Optimal gap-affine alignment in O(s) space.最优间隙仿射对齐,时间复杂度为 O(s)。
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad074.
4
WFA-GPU: gap-affine pairwise read-alignment using GPUs.WFA-GPU:基于 GPU 的缺口仿射两两序列比对
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad701.
5
Fast gap-affine pairwise alignment using the wavefront algorithm.基于波前算法的快速间隙亲和双序列比对。
Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.
6
CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.CUDASW++4.0:基于 GPU 的超快 Smith-Waterman 蛋白质序列数据库搜索。
BMC Bioinformatics. 2024 Nov 2;25(1):342. doi: 10.1186/s12859-024-05965-6.
7
PASTASpark: multiple sequence alignment meets Big Data.PASTASpark:多重序列比对与大数据相遇。
Bioinformatics. 2017 Sep 15;33(18):2948-2950. doi: 10.1093/bioinformatics/btx354.
8
Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware.MAFFT在支持CUDA的图形硬件上的并行实现。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):205-18. doi: 10.1109/TCBB.2014.2351801.
9
An improved distance matrix computation algorithm for multicore clusters.一种用于多核集群的改进型距离矩阵计算算法。
Biomed Res Int. 2014;2014:406178. doi: 10.1155/2014/406178. Epub 2014 Jun 12.
10
CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.CMSA:一种用于多个相似RNA/DNA序列比对的异构CPU/GPU计算系统。
BMC Bioinformatics. 2017 Jun 24;18(1):315. doi: 10.1186/s12859-017-1725-6.

引用本文的文献

1
QuickEd: high-performance exact sequence alignment based on bound-and-align.QuickEd:基于绑定与比对的高性能精确序列比对
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf112.

本文引用的文献

1
WFA-GPU: gap-affine pairwise read-alignment using GPUs.WFA-GPU:基于 GPU 的缺口仿射两两序列比对
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad701.
2
A framework for high-throughput sequence alignment using real processing-in-memory systems.基于真实处理内存储系统的高通量序列比对框架。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad155.
3
Optimal gap-affine alignment in O(s) space.最优间隙仿射对齐,时间复杂度为 O(s)。
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad074.
4
From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.从分子到基因组变异:通过智能算法和架构加速基因组分析
Comput Struct Biotechnol J. 2022 Aug 18;20:4579-4599. doi: 10.1016/j.csbj.2022.08.019. eCollection 2022.
5
Technology dictates algorithms: recent developments in read alignment.技术决定算法:读段比对的最新进展。
Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.
6
Fast gap-affine pairwise alignment using the wavefront algorithm.基于波前算法的快速间隙亲和双序列比对。
Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.
7
Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists.下一代测序生物信息学管道验证的标准和指南:分子病理学协会和美国病理学家学院的联合建议。
J Mol Diagn. 2018 Jan;20(1):4-27. doi: 10.1016/j.jmoldx.2017.11.003. Epub 2017 Nov 21.
8
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail:用于全局、半全局和局部成对序列比对的SIMD C库。
BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.
9
High-throughput sequencing technologies.高通量测序技术
Mol Cell. 2015 May 21;58(4):586-97. doi: 10.1016/j.molcel.2015.05.004.
10
Overview of high throughput sequencing technologies to elucidate molecular pathways in cardiovascular diseases.高通量测序技术概述,阐明心血管疾病中的分子途径。
Circ Res. 2013 Jun 7;112(12):1613-23. doi: 10.1161/CIRCRESAHA.113.300939.