• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BSAlign:一个核苷酸序列比对库。

BSAlign: A Library for Nucleotide Sequence Alignment.

机构信息

Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China.

出版信息

Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae025.

DOI:10.1093/gpbjnl/qzae025
PMID:39209796
Abstract

Increasing the accuracy of the nucleotide sequence alignment is an essential issue in genomics research. Although classic dynamic programming (DP) algorithms (e.g., Smith-Waterman and Needleman-Wunsch) guarantee to produce the optimal result, their time complexity hinders the application of large-scale sequence alignment. Many optimization efforts that aim to accelerate the alignment process generally come from three perspectives: redesigning data structures [e.g., diagonal or striped Single Instruction Multiple Data (SIMD) implementations], increasing the number of parallelisms in SIMD operations (e.g., difference recurrence relation), or reducing search space (e.g., banded DP). However, no methods combine all these three aspects to build an ultra-fast algorithm. In this study, we developed a Banded Striped Aligner (BSAlign) library that delivers accurate alignment results at an ultra-fast speed by knitting a series of novel methods together to take advantage of all of the aforementioned three perspectives with highlights such as active F-loop in striped vectorization and striped move in banded DP. We applied our new acceleration design on both regular and edit distance pairwise alignment. BSAlign achieved 2-fold speed-up than other SIMD-based implementations for regular pairwise alignment, and 1.5-fold to 4-fold speed-up in edit distance-based implementations for long reads. BSAlign is implemented in C programing language and is available at https://github.com/ruanjue/bsalign.

摘要

提高核苷酸序列比对的准确性是基因组学研究中的一个重要问题。虽然经典的动态规划(DP)算法(如 Smith-Waterman 和 Needleman-Wunsch)保证能得到最优结果,但它们的时间复杂度限制了大规模序列比对的应用。许多旨在加速比对过程的优化努力通常来自三个方面:重新设计数据结构[例如,对角线或带状单指令多数据(SIMD)实现],增加 SIMD 操作中的并行度(例如,差分递归关系),或减少搜索空间(例如,带状 DP)。然而,没有方法结合这三个方面来构建超快速算法。在这项研究中,我们开发了一个带状条纹对齐器(BSAlign)库,通过将一系列新方法编织在一起,利用上述三个方面的优势,实现了超快速、准确的对齐,其亮点包括条纹向量中的活动 F 环和带状 DP 中的条纹移动。我们将新的加速设计应用于常规和编辑距离比对。对于常规的两两比对,BSAlign 比其他基于 SIMD 的实现快 2 倍,而对于长读长的编辑距离比对,BSAlign 比其他实现快 1.5 到 4 倍。BSAlign 是用 C 编程语言实现的,可以在 https://github.com/ruanjue/bsalign 上获得。

相似文献

1
BSAlign: A Library for Nucleotide Sequence Alignment.BSAlign:一个核苷酸序列比对库。
Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae025.
2
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail:用于全局、半全局和局部成对序列比对的SIMD C库。
BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.
3
Introducing difference recurrence relations for faster semi-global alignment of long sequences.引入差异递归关系以加快长序列的半全局比对。
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):45. doi: 10.1186/s12859-018-2014-8.
4
SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications.SSW 库:一个用于基因组应用的 SIMD Smith-Waterman C/C++ 库。
PLoS One. 2013 Dec 4;8(12):e82138. doi: 10.1371/journal.pone.0082138. eCollection 2013.
5
Striped Smith-Waterman speeds database searches six times over other SIMD implementations.条纹史密斯-沃特曼算法在数据库搜索速度上比其他单指令多数据(SIMD)实现快六倍。
Bioinformatics. 2007 Jan 15;23(2):156-61. doi: 10.1093/bioinformatics/btl582. Epub 2006 Nov 16.
6
FORAlign: accelerating gap-affine DNA pairwise sequence alignment using FOR-blocks based on Four Russians approach with linear space complexity.FORAlign:基于四俄罗斯人方法,利用FOR块加速具有线性空间复杂度的间隙仿射DNA双序列比对。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf061.
7
Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.块对齐器:一种自适应的 SIMD 加速序列和位置特定评分矩阵的对齐器。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad487.
8
Pairwise alignment for very long nucleic acid sequences.非常长的核酸序列的两两比对。
Biochem Biophys Res Commun. 2018 Jul 20;502(3):313-317. doi: 10.1016/j.bbrc.2018.05.134. Epub 2018 May 29.
9
CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.CUDASW++4.0:基于 GPU 的超快 Smith-Waterman 蛋白质序列数据库搜索。
BMC Bioinformatics. 2024 Nov 2;25(1):342. doi: 10.1186/s12859-024-05965-6.
10
Fast gap-affine pairwise alignment using the wavefront algorithm.基于波前算法的快速间隙亲和双序列比对。
Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.

引用本文的文献

1
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap.使用LexicMap与数百万个原核生物基因组进行高效序列比对。
Nat Biotechnol. 2025 Sep 10. doi: 10.1038/s41587-025-02812-8.
2
FORAlign: accelerating gap-affine DNA pairwise sequence alignment using FOR-blocks based on Four Russians approach with linear space complexity.FORAlign:基于四俄罗斯人方法,利用FOR块加速具有线性空间复杂度的间隙仿射DNA双序列比对。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf061.
3
TSTA: thread and SIMD-based trapezoidal pairwise/multiple sequence-alignment method.

本文引用的文献

1
PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.PBSIM2:一种带有新型质量评分生成模型的长读测序模拟软件。
Bioinformatics. 2021 May 5;37(5):589-595. doi: 10.1093/bioinformatics/btaa835.
2
Fast gap-affine pairwise alignment using the wavefront algorithm.基于波前算法的快速间隙亲和双序列比对。
Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.
3
BGSA: a bit-parallel global sequence alignment toolkit for multi-core and many-core architectures.BGSA:用于多核和众核架构的位并行全局序列比对工具包。
TSTA:基于线程和单指令多数据的梯形成对/多序列比对方法。
GigaByte. 2024 Nov 5;2024:gigabyte141. doi: 10.46471/gigabyte.141. eCollection 2024.
4
CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.CUDASW++4.0:基于 GPU 的超快 Smith-Waterman 蛋白质序列数据库搜索。
BMC Bioinformatics. 2024 Nov 2;25(1):342. doi: 10.1186/s12859-024-05965-6.
Bioinformatics. 2019 Jul 1;35(13):2306-2308. doi: 10.1093/bioinformatics/bty930.
4
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
5
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.使用矢量化和多线程在 SeqAn 中进行通用加速序列比对。
Bioinformatics. 2018 Oct 15;34(20):3437-3445. doi: 10.1093/bioinformatics/bty380.
6
Introducing difference recurrence relations for faster semi-global alignment of long sequences.引入差异递归关系以加快长序列的半全局比对。
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):45. doi: 10.1186/s12859-018-2014-8.
7
Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.Edlib:一个使用编辑距离进行快速、精确序列比对的C/C++库。
Bioinformatics. 2017 May 1;33(9):1394-1395. doi: 10.1093/bioinformatics/btw753.
8
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail:用于全局、半全局和局部成对序列比对的SIMD C库。
BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.
9
SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications.SSW 库:一个用于基因组应用的 SIMD Smith-Waterman C/C++ 库。
PLoS One. 2013 Dec 4;8(12):e82138. doi: 10.1371/journal.pone.0082138. eCollection 2013.
10
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.