• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

位并行序列到图的对齐。

Bit-parallel sequence-to-graph alignment.

机构信息

Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, 66123 Saarbrücken, Germany.

Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbrücken, Germany.

出版信息

Bioinformatics. 2019 Oct 1;35(19):3599-3607. doi: 10.1093/bioinformatics/btz162.

DOI:10.1093/bioinformatics/btz162
PMID:30851095
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6761980/
Abstract

MOTIVATION

Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction and variant calling with respect to a variation graph.

RESULTS

We generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers' bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of up to w over naive algorithms. For a graph with |V| nodes and |E| edges and a sequence of length m, our bitvector-based graph alignment algorithm reaches a worst case runtime of O(|V|+⌈mw⌉|E| log w) for acyclic graphs and O(|V|+m|E| log w) for arbitrary cyclic graphs. We apply it to five different types of graphs and observe a speedup between 3-fold and 20-fold compared with a previous (asymptotically optimal) alignment algorithm.

AVAILABILITY AND IMPLEMENTATION

https://github.com/maickrau/GraphAligner.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

图通常用于表示序列集。边或节点都可以由序列标记,因此图中的每条路径都拼写一个连接的序列。例如,包括表示基因组组装的图,如字符串图和 de Bruijn 图,以及表示泛基因组的图,因此表示种群中存在的遗传变异。能够将测序读取与这些图对齐是许多分析的关键步骤,其应用包括基因组组装、读取错误校正和针对变异图的变异调用。

结果

我们将两种线性序列到序列算法推广到图中:用于精确匹配的 Shift-And 算法和用于半全局对齐的 Myers 位向量算法。这两种线性算法都是基于使用固定数量的操作处理 w 个序列字符,其中 w 是机器的字大小(通常为 64),并实现高达 w 的速度提升超过了原始算法。对于具有 |V| 个节点和 |E| 条边且长度为 m 的序列,我们基于位向量的图对齐算法在非循环图中的最坏情况运行时间为 O(|V|+⌈mw⌉|E|logw),在任意循环图中的运行时间为 O(|V|+m|E|logw)。我们将其应用于五种不同类型的图,并观察到与以前的(渐近最优)对齐算法相比,速度提高了 3 倍到 20 倍。

可用性和实现

https://github.com/maickrau/GraphAligner。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/f62d7febf43f/btz162f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/a8bceef9b04e/btz162f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/9243324741db/btz162f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/44f90d149ed3/btz162f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/d04a8a519037/btz162f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/f62d7febf43f/btz162f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/a8bceef9b04e/btz162f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/9243324741db/btz162f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/44f90d149ed3/btz162f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/d04a8a519037/btz162f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b763/6761980/f62d7febf43f/btz162f5.jpg

相似文献

1
Bit-parallel sequence-to-graph alignment.位并行序列到图的对齐。
Bioinformatics. 2019 Oct 1;35(19):3599-3607. doi: 10.1093/bioinformatics/btz162.
2
Chaining for accurate alignment of erroneous long reads to acyclic variation graphs.基于无环变异图的错误长读精确比对链。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad460.
3
GraphAligner: rapid and versatile sequence-to-graph alignment.GraphAligner:快速且通用的序列到图的对齐方法。
Genome Biol. 2020 Sep 24;21(1):253. doi: 10.1186/s13059-020-02157-2.
4
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。
BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.
5
Integration of string and de Bruijn graphs for genome assembly.用于基因组组装的弦图与德布鲁因图整合
Bioinformatics. 2016 May 1;32(9):1301-7. doi: 10.1093/bioinformatics/btw011. Epub 2016 Jan 10.
6
Constructing small genome graphs via string compression.通过字符串压缩构建小基因组图。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i205-i213. doi: 10.1093/bioinformatics/btab281.
7
On the Hardness of Sequence Alignment on De Bruijn Graphs.基于 De Bruijn 图的序列比对问题的难度研究
J Comput Biol. 2022 Dec;29(12):1377-1396. doi: 10.1089/cmb.2022.0411. Epub 2022 Nov 25.
8
Integrating long-range connectivity information into de Bruijn graphs.将长程连接信息整合到 de Bruijn 图中。
Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.
9
Sequence Alignment on Directed Graphs.有向图上的序列比对
J Comput Biol. 2019 Jan;26(1):53-67. doi: 10.1089/cmb.2017.0264. Epub 2018 Sep 8.
10
AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS.一种在双向德布鲁因图上的中国邮路问题的高效算法。
Discrete Math Algorithms Appl. 2010;1:184-196. doi: 10.1007/978-3-642-17458-2_16.

引用本文的文献

1
A survey of sequence-to-graph mapping algorithms in the pangenome era.泛基因组时代序列到图谱映射算法综述。
Genome Biol. 2025 May 22;26(1):138. doi: 10.1186/s13059-025-03606-6.
2
Pangenome comparison via ED strings.通过编辑距离字符串进行泛基因组比较。
Front Bioinform. 2024 Sep 26;4:1397036. doi: 10.3389/fbinf.2024.1397036. eCollection 2024.
3
Haplotype-aware sequence alignment to pangenome graphs.基于单倍型感知的序列比对到泛基因组图谱。

本文引用的文献

1
Multi-platform discovery of haplotype-resolved structural variation in human genomes.多平台发现人类基因组中单体型分辨率结构变异。
Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z.
2
BGSA: a bit-parallel global sequence alignment toolkit for multi-core and many-core architectures.BGSA:用于多核和众核架构的位并行全局序列比对工具包。
Bioinformatics. 2019 Jul 1;35(13):2306-2308. doi: 10.1093/bioinformatics/bty930.
3
Variation graph toolkit improves read mapping by representing genetic variation in the reference.
Genome Res. 2024 Oct 11;34(9):1265-1275. doi: 10.1101/gr.279143.124.
4
NIPT-PG: empowering non-invasive prenatal testing to learn from population genomics through an incremental pan-genomic approach.NIPT-PG:通过渐进式泛基因组方法,赋予无创产前检测从人群基因组学中学习的能力。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae266.
5
RecGraph: recombination-aware alignment of sequences to variation graphs.RecGraph:面向变异图的重组感知序列比对。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae292.
6
Chaining for accurate alignment of erroneous long reads to acyclic variation graphs.基于无环变异图的错误长读精确比对链。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad460.
7
Computational graph pangenomics: a tutorial on data structures and their applications.计算图泛基因组学:数据结构及其应用教程
Nat Comput. 2022 Mar;21(1):81-108. doi: 10.1007/s11047-022-09882-6. Epub 2022 Mar 4.
8
VariantStore: an index for large-scale genomic variant search.变体存储:用于大规模基因组变体搜索的索引。
Genome Biol. 2021 Aug 19;22(1):231. doi: 10.1186/s13059-021-02442-8.
9
STRONG: metagenomics strain resolution on assembly graphs.基于组装图的宏基因组菌株分辨率
Genome Biol. 2021 Jul 26;22(1):214. doi: 10.1186/s13059-021-02419-7.
10
Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs.利用重复泛基因组图对人群进行可变串联重复序列变异分析。
Nat Commun. 2021 Jul 12;12(1):4250. doi: 10.1038/s41467-021-24378-0.
变异图谱工具包通过表示参考中的遗传变异来提高读映射质量。
Nat Biotechnol. 2018 Oct;36(9):875-879. doi: 10.1038/nbt.4227. Epub 2018 Aug 20.
4
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
5
Genome graphs and the evolution of genome inference.基因组图谱与基因组推断的演变
Genome Res. 2017 May;27(5):665-676. doi: 10.1101/gr.214155.116. Epub 2017 Mar 30.
6
High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.使用群体参考图从全基因组测序数据中进行高精度HLA分型推断
PLoS Comput Biol. 2016 Oct 28;12(10):e1005151. doi: 10.1371/journal.pcbi.1005151. eCollection 2016 Oct.
7
Computational pan-genomics: status, promises and challenges.计算泛基因组学:现状、前景与挑战。
Brief Bioinform. 2018 Jan 1;19(1):118-135. doi: 10.1093/bib/bbw089.
8
Compacting de Bruijn graphs from sequencing data quickly and in low memory.从测序数据中快速且低内存地压缩德布鲁因图。
Bioinformatics. 2016 Jun 15;32(12):i201-i208. doi: 10.1093/bioinformatics/btw279.
9
Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。
BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.
10
hybridSPAdes: an algorithm for hybrid assembly of short and long reads.hybridSPAdes:一种用于短读长和长读长混合组装的算法。
Bioinformatics. 2016 Apr 1;32(7):1009-15. doi: 10.1093/bioinformatics/btv688. Epub 2015 Nov 20.