• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

引入差异递归关系以加快长序列的半全局比对。

Introducing difference recurrence relations for faster semi-global alignment of long sequences.

机构信息

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa City, Chiba, Japan.

出版信息

BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):45. doi: 10.1186/s12859-018-2014-8.

DOI:10.1186/s12859-018-2014-8
PMID:29504909
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5836832/
Abstract

BACKGROUND

The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors.

RESULTS

We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs.

CONCLUSIONS

Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .

摘要

背景

单分子 DNA 测序仪的读长已达到 1Mb。目前广泛应用于分析此类长读长的流行对齐软件工具通常利用单指令多数据(SIMD)操作,以加快在原点固定对齐起始位置的 Smith-Waterman-Gotoh(SWG)算法中的动态规划(DP)矩阵的计算。然而,当待对齐的序列较长时,用于存储 DP 矩阵中值的 16 位或 32 位整数会阻碍现代处理器充分利用其 SIMD 宽度。

结果

我们提出了一种更快的半全局对齐算法“差值递归关系”,其运行速度比最先进的算法快 2.1 倍。与直接计算和存储 DP 矩阵中的所有值不同,我们的算法主要计算和存储矩阵中相邻单元格值之间的差异。虽然 SWG 算法和我们的算法可以输出完全相同的结果,但我们的算法主要涉及 8 位整数运算,使我们能够充分利用现代处理器的 SIMD 操作全宽(例如 32)。我们还开发了一个库 libgaba,以便开发人员可以轻松地将我们的算法集成到对齐程序中。

结论

我们的新颖算法和优化的库实现将有助于加速使用成对对齐阶段的核苷酸长读分析算法。该库是用 C 编程语言实现的,并可在 https://github.com/ocxtal/libgaba 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/5836832/b499b2736cdc/12859_2018_2014_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/5836832/5618603fbaab/12859_2018_2014_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/5836832/6f9f60ffaa31/12859_2018_2014_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/5836832/0872844c2c3a/12859_2018_2014_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/5836832/b499b2736cdc/12859_2018_2014_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/5836832/5618603fbaab/12859_2018_2014_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/5836832/6f9f60ffaa31/12859_2018_2014_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/5836832/0872844c2c3a/12859_2018_2014_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/5836832/b499b2736cdc/12859_2018_2014_Fig4_HTML.jpg

相似文献

1
Introducing difference recurrence relations for faster semi-global alignment of long sequences.引入差异递归关系以加快长序列的半全局比对。
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):45. doi: 10.1186/s12859-018-2014-8.
2
BSAlign: A Library for Nucleotide Sequence Alignment.BSAlign:一个核苷酸序列比对库。
Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae025.
3
Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.块对齐器:一种自适应的 SIMD 加速序列和位置特定评分矩阵的对齐器。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad487.
4
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail:用于全局、半全局和局部成对序列比对的SIMD C库。
BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.
5
Pairwise alignment of nucleotide sequences using maximal exact matches.使用最大完全匹配进行核苷酸序列的两两比对。
BMC Bioinformatics. 2019 May 21;20(1):261. doi: 10.1186/s12859-019-2827-0.
6
Fast gap-affine pairwise alignment using the wavefront algorithm.基于波前算法的快速间隙亲和双序列比对。
Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.
7
SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications.SSW 库:一个用于基因组应用的 SIMD Smith-Waterman C/C++ 库。
PLoS One. 2013 Dec 4;8(12):e82138. doi: 10.1371/journal.pone.0082138. eCollection 2013.
8
abPOA: an SIMD-based C library for fast partial order alignment using adaptive band.abPOA:一个基于 SIMD 的 C 库,用于使用自适应带实现快速偏序比对。
Bioinformatics. 2021 Aug 9;37(15):2209-2211. doi: 10.1093/bioinformatics/btaa963.
9
Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel.将 SIMD 和 SIMT 架构进行耦合以提高具有系统发育感知的对齐核的性能。
BMC Bioinformatics. 2012 Aug 9;13:196. doi: 10.1186/1471-2105-13-196.
10
Pairwise alignment for very long nucleic acid sequences.非常长的核酸序列的两两比对。
Biochem Biophys Res Commun. 2018 Jul 20;502(3):313-317. doi: 10.1016/j.bbrc.2018.05.134. Epub 2018 May 29.

引用本文的文献

1
Accurate short-read alignment through -index-based pangenome indexing.通过基于索引的泛基因组索引实现准确的短读比对。
Genome Res. 2025 Jul 1;35(7):1609-1620. doi: 10.1101/gr.279858.124.
2
FPGA-based accelerator for adaptive banded event alignment in nanopore sequencing data analysis.用于纳米孔测序数据分析中自适应带状事件对齐的基于现场可编程门阵列的加速器
BMC Bioinformatics. 2025 Mar 17;26(1):83. doi: 10.1186/s12859-024-06011-1.
3
QuickEd: high-performance exact sequence alignment based on bound-and-align.QuickEd:基于绑定与比对的高性能精确序列比对

本文引用的文献

1
Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。
Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.
2
Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.Edlib:一个使用编辑距离进行快速、精确序列比对的C/C++库。
Bioinformatics. 2017 May 1;33(9):1394-1395. doi: 10.1093/bioinformatics/btw753.
3
IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing.
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf112.
4
xRead: a coverage-guided approach for scalable construction of read overlapping graph.XRead:一种用于可扩展构建读段重叠图的覆盖度引导方法。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf007.
5
BWT construction and search at the terabase scale.万亿碱基规模下的BWT构建与搜索。
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae717.
6
TSTA: thread and SIMD-based trapezoidal pairwise/multiple sequence-alignment method.TSTA:基于线程和单指令多数据的梯形成对/多序列比对方法。
GigaByte. 2024 Nov 5;2024:gigabyte141. doi: 10.46471/gigabyte.141. eCollection 2024.
7
miniSNV: accurate and fast single nucleotide variant calling from nanopore sequencing data.miniSNV:从纳米孔测序数据中进行准确快速的单核苷酸变异calling。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae473.
8
BSAlign: A Library for Nucleotide Sequence Alignment.BSAlign:一个核苷酸序列比对库。
Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae025.
9
Circular metagenome-assembled genome of Cloacimonadota recovered from anaerobic digestion sludge.从厌氧消化污泥中获得的栖粪杆菌门环状宏基因组组装基因组
Microbiol Resour Announc. 2024 Jul 18;13(7):e0040324. doi: 10.1128/mra.00403-24. Epub 2024 Jun 25.
10
Circular metagenome-assembled genome of Patescibacteria recovered from anaerobic digestion sludge.从厌氧消化污泥中获得的帕氏细菌环状宏基因组组装基因组
Microbiol Resour Announc. 2024 Apr 11;13(4):e0008324. doi: 10.1128/mra.00083-24. Epub 2024 Mar 25.
IDP-ASE:通过杂交测序在基因和基因异构体水平进行单倍型分型和等位基因特异性表达定量分析。
Nucleic Acids Res. 2017 Mar 17;45(5):e32. doi: 10.1093/nar/gkw1076.
4
Discovery and genotyping of structural variation from long-read haploid genome sequence data.从长读单倍体基因组序列数据中发现结构变异并进行基因分型。
Genome Res. 2017 May;27(5):677-685. doi: 10.1101/gr.214007.116. Epub 2016 Nov 28.
5
Fast and sensitive mapping of nanopore sequencing reads with GraphMap.使用GraphMap对纳米孔测序读数进行快速灵敏的映射
Nat Commun. 2016 Apr 15;7:11307. doi: 10.1038/ncomms11307.
6
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail:用于全局、半全局和局部成对序列比对的SIMD C库。
BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.
7
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.利用单分子测序和局部敏感哈希组装大型基因组。
Nat Biotechnol. 2015 Jun;33(6):623-30. doi: 10.1038/nbt.3238. Epub 2015 May 25.
8
BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm.BitPAl:一种位并行、通用的整数评分序列比对算法。
Bioinformatics. 2014 Nov 15;30(22):3166-73. doi: 10.1093/bioinformatics/btu507. Epub 2014 Jul 29.
9
PBHoney: identifying genomic variants via long-read discordance and interrupted mapping.PBHoney:通过长读段不一致性和中断映射识别基因组变异体。
BMC Bioinformatics. 2014 Jun 10;15:180. doi: 10.1186/1471-2105-15-180.
10
SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications.SSW 库:一个用于基因组应用的 SIMD Smith-Waterman C/C++ 库。
PLoS One. 2013 Dec 4;8(12):e82138. doi: 10.1371/journal.pone.0082138. eCollection 2013.