块对齐器：一种自适应的 SIMD 加速序列和位置特定评分矩阵的对齐器。

Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.

机构信息

University of California Los Angeles, Los Angeles, CA, United States.

School of Biological Sciences, Artificial Intelligence Institute, Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea.

出版信息

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad487.

DOI:10.1093/bioinformatics/btad487

PMID:37535681

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10457662/

Abstract

MOTIVATION

Efficiently aligning sequences is a fundamental problem in bioinformatics. Many recent algorithms for computing alignments through Smith-Waterman-Gotoh dynamic programming (DP) exploit Single Instruction Multiple Data (SIMD) operations on modern CPUs for speed. However, these advances have largely ignored difficulties associated with efficiently handling complex scoring matrices or large gaps (insertions or deletions).

RESULTS

We propose a new SIMD-accelerated algorithm called Block Aligner for aligning nucleotide and protein sequences against other sequences or position-specific scoring matrices. We introduce a new paradigm that uses blocks in the DP matrix that greedily shift, grow, and shrink. This approach allows regions of the DP matrix to be adaptively computed. Our algorithm reaches over 5-10 times faster than some previous methods while incurring an error rate of less than 3% on protein and long read datasets, despite large gaps and low sequence identities.

AVAILABILITY AND IMPLEMENTATION

Our algorithm is implemented for global, local, and X-drop alignments. It is available as a Rust library (with C bindings) at https://github.com/Daniel-Liu-c0deb0t/block-aligner.

摘要

动机

有效地对齐序列是生物信息学中的一个基本问题。许多最近的通过 Smith-Waterman-Gotoh 动态规划 (DP) 计算比对的算法利用现代 CPU 上的单指令多数据 (SIMD) 操作来提高速度。然而，这些进展在很大程度上忽略了有效处理复杂评分矩阵或大间隙（插入或缺失）的困难。

结果

我们提出了一种新的 SIMD 加速算法，称为块对齐器，用于对齐核苷酸和蛋白质序列与其他序列或位置特定评分矩阵。我们引入了一种新的范例，该范例使用 DP 矩阵中的块进行贪婪地移动、增长和收缩。这种方法允许自适应地计算 DP 矩阵的区域。我们的算法的速度比一些以前的方法快 5-10 倍，而在蛋白质和长读数据集上的错误率不到 3%，尽管存在大间隙和低序列同一性。

可用性和实现

我们的算法实现了全局、局部和 X 下降对齐。它作为一个 Rust 库（带有 C 绑定）在 https://github.com/Daniel-Liu-c0deb0t/block-aligner 上可用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61a0/10457662/b0d9c8e7e841/btad487f1.jpg

相似文献

Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.块对齐器：一种自适应的 SIMD 加速序列和位置特定评分矩阵的对齐器。

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad487.

Introducing difference recurrence relations for faster semi-global alignment of long sequences.引入差异递归关系以加快长序列的半全局比对。

BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):45. doi: 10.1186/s12859-018-2014-8.

Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail：用于全局、半全局和局部成对序列比对的SIMD C库。

BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.

BSAlign: A Library for Nucleotide Sequence Alignment.BSAlign：一个核苷酸序列比对库。

Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae025.

Striped Smith-Waterman speeds database searches six times over other SIMD implementations.条纹史密斯-沃特曼算法在数据库搜索速度上比其他单指令多数据（SIMD）实现快六倍。

Bioinformatics. 2007 Jan 15;23(2):156-61. doi: 10.1093/bioinformatics/btl582. Epub 2006 Nov 16.

CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment.CUDA兼容的GPU卡作为用于Smith-Waterman序列比对的高效硬件加速器。

BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S10. doi: 10.1186/1471-2105-9-S2-S10.

Fast gap-affine pairwise alignment using the wavefront algorithm.基于波前算法的快速间隙亲和双序列比对。

Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.

SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications.SSW 库：一个用于基因组应用的 SIMD Smith-Waterman C/C++ 库。

PLoS One. 2013 Dec 4;8(12):e82138. doi: 10.1371/journal.pone.0082138. eCollection 2013.

ADEPT: a domain independent sequence alignment strategy for gpu architectures.ADEPT：一种适用于 GPU 架构的与领域无关的序列比对策略。

BMC Bioinformatics. 2020 Sep 15;21(1):406. doi: 10.1186/s12859-020-03720-1.

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band.abPOA：一个基于 SIMD 的 C 库，用于使用自适应带实现快速偏序比对。

Bioinformatics. 2021 Aug 9;37(15):2209-2211. doi: 10.1093/bioinformatics/btaa963.

引用本文的文献

High-resolution metagenome assembly for modern long reads with myloasm.利用肌浆瘤对现代长读长进行高分辨率宏基因组组装。

bioRxiv. 2025 Sep 6:2025.09.05.674543. doi: 10.1101/2025.09.05.674543.

Efficient sequence alignment against millions of prokaryotic genomes with LexicMap.使用LexicMap与数百万个原核生物基因组进行高效序列比对。

Nat Biotechnol. 2025 Sep 10. doi: 10.1038/s41587-025-02812-8.

Predicting protein-protein interactions in microbes associated with cardiovascular diseases using deep denoising autoencoders and evolutionary information.使用深度去噪自动编码器和进化信息预测与心血管疾病相关的微生物中的蛋白质-蛋白质相互作用。

Front Pharmacol. 2025 Mar 11;16:1565860. doi: 10.3389/fphar.2025.1565860. eCollection 2025.

FORAlign: accelerating gap-affine DNA pairwise sequence alignment using FOR-blocks based on Four Russians approach with linear space complexity.FORAlign：基于四俄罗斯人方法，利用FOR块加速具有线性空间复杂度的间隙仿射DNA双序列比对。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf061.

An FPGA-based hardware accelerator supporting sensitive sequence homology filtering with profile hidden Markov models.基于 FPGA 的硬件加速器，支持使用隐马尔可夫模型进行敏感序列同源性过滤。

BMC Bioinformatics. 2024 Jul 29;25(1):247. doi: 10.1186/s12859-024-05879-3.

Exact global alignment using A* with chaining seed heuristic and match pruning.使用带有链接种子启发式方法和匹配剪枝的A*算法进行精确全局比对。

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae032.

本文引用的文献

Exact global alignment using A* with chaining seed heuristic and match pruning.使用带有链接种子启发式方法和匹配剪枝的A*算法进行精确全局比对。

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae032.

A survey of mapping algorithms in the long-reads era.长读时代的图谱算法研究综述。

Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3.

Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。

Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.

Optimal gap-affine alignment in O(s) space.最优间隙仿射对齐，时间复杂度为 O(s)。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad074.

Erratum to: abPOA: an SIMD-based C library for fast partial order alignment using adaptive band.勘误：abPOA：一个基于单指令多数据（SIMD）的C库，用于使用自适应条带进行快速偏序比对。

Bioinformatics. 2021 Oct 11;37(19):3384. doi: 10.1093/bioinformatics/btab587.

Technology dictates algorithms: recent developments in read alignment.技术决定算法：读段比对的最新进展。

Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.

Fast gap-affine pairwise alignment using the wavefront algorithm.基于波前算法的快速间隙亲和双序列比对。

Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.

GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis.GPU 加速的自适应带状事件对齐，用于快速比较纳米孔信号分析。

BMC Bioinformatics. 2020 Aug 5;21(1):343. doi: 10.1186/s12859-020-03697-x.

Sequencing of human genomes with nanopore technology.纳米孔技术测序人类基因组。

Nat Commun. 2019 Apr 23;10(1):1869. doi: 10.1038/s41467-019-09637-5.

Minimap2: pairwise alignment for nucleotide sequences.Minimap2：核苷酸序列的两两比对。

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

块对齐器：一种自适应的 SIMD 加速序列和位置特定评分矩阵的对齐器。

Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献