• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 De Bruijn 图的序列比对问题的难度研究

On the Hardness of Sequence Alignment on De Bruijn Graphs.

机构信息

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.

Department of Computer Science, North Carolina State University, Raleigh, North Carolina, USA.

出版信息

J Comput Biol. 2022 Dec;29(12):1377-1396. doi: 10.1089/cmb.2022.0411. Epub 2022 Nov 25.

DOI:10.1089/cmb.2022.0411
PMID:36450127
Abstract

The problem of aligning a sequence to a walk in a labeled graph is of fundamental importance to Computational Biology. For an arbitrary graph and a pattern of length , a lower bound based on the Strong Exponential Time Hypothesis implies that an algorithm for finding a walk in exactly matching significantly faster than time is unlikely. However, for many special graphs, such as de Bruijn graphs, the problem can be solved in linear time. For approximate matching, the picture is more complex. When edits (substitutions, insertions, and deletions) are only allowed to the pattern, or when the graph is acyclic, the problem is solvable in time. When edits are allowed to arbitrary cyclic graphs, the problem becomes NP-complete, even on binary alphabets. Moreover, NP-completeness continues to hold even when edits are restricted to only substitutions. Despite the popularity of the de Bruijn graphs in Computational Biology, the complexity of approximate pattern matching on the de Bruijn graphs remained unknown. We investigate this problem and show that the properties that make the de Bruijn graphs amenable to efficient exact pattern matching do not extend to approximate matching, even when restricted to the substitutions only case with alphabet size four. Specifically, we prove that determining the existence of a matching walk in a de Bruijn graph is NP-complete when substitutions are allowed to the graph. We also demonstrate that an algorithm significantly faster than is unlikely for the de Bruijn graphs in the case where substitutions are only allowed to the pattern. This stands in contrast to pattern-to-text matching where exact matching is solvable in linear time, such as on the de Bruijn graphs, but approximate matching under substitutions is solvable in subquadratic time, where is the text's length.

摘要

将序列与标记图中的路径对齐的问题对计算生物学具有重要意义。对于任意图 和长度为 的模式 ,基于强指数时间假设的下界意味着,找到与 精确匹配的 中的路径的算法不太可能比 时间快。然而,对于许多特殊图,例如 de Bruijn 图,该问题可以在线性时间内解决。对于近似匹配,情况更为复杂。当仅允许对模式进行编辑(替换、插入和删除)时,或者当图是无环时,该问题可以在 时间内解决。当编辑允许任意循环图时,该问题就会变得 NP 完全,即使在二进制字母表上也是如此。此外,即使编辑仅限于替换,NP 完全性仍然成立。尽管 de Bruijn 图在计算生物学中很受欢迎,但在 de Bruijn 图上进行近似模式匹配的复杂性仍然未知。我们研究了这个问题,并表明,使 de Bruijn 图易于进行高效精确模式匹配的特性并不能扩展到近似匹配,即使在仅允许替换的情况下也是如此,并且字母表大小为 4。具体来说,我们证明了当允许对图进行替换时,在 de Bruijn 图中确定是否存在匹配路径是 NP 完全的。我们还证明了在仅允许替换模式的情况下,对于 de Bruijn 图,不太可能存在比 更快的算法。这与模式到文本匹配形成对比,在模式到文本匹配中,精确匹配可以在线性时间内解决,例如在 de Bruijn 图上,但在允许替换的情况下,近似匹配可以在亚二次 时间内解决,其中 是文本的长度。

相似文献

1
On the Hardness of Sequence Alignment on De Bruijn Graphs.基于 De Bruijn 图的序列比对问题的难度研究
J Comput Biol. 2022 Dec;29(12):1377-1396. doi: 10.1089/cmb.2022.0411. Epub 2022 Nov 25.
2
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。
BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.
3
AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS.一种在双向德布鲁因图上的中国邮路问题的高效算法。
Discrete Math Algorithms Appl. 2010;1:184-196. doi: 10.1007/978-3-642-17458-2_16.
4
Bit-parallel sequence-to-graph alignment.位并行序列到图的对齐。
Bioinformatics. 2019 Oct 1;35(19):3599-3607. doi: 10.1093/bioinformatics/btz162.
5
Pan-genome de Bruijn graph using the bidirectional FM-index.基于双向 FM-index 的泛基因组 de Bruijn 图
BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6.
6
deBGR: an efficient and near-exact representation of the weighted de Bruijn graph.deBGR:一种高效且近乎精确的加权 de Bruijn 图表示方法。
Bioinformatics. 2017 Jul 15;33(14):i133-i141. doi: 10.1093/bioinformatics/btx261.
7
Lossless indexing with counting de Bruijn graphs.基于计数型 de Bruijn 图的无损索引
Genome Res. 2022 Sep 27;32(9):1754-1764. doi: 10.1101/gr.276607.122.
8
Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。
BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.
9
Practical dynamic de Bruijn graphs.实用动态 de Bruijn 图。
Bioinformatics. 2018 Dec 15;34(24):4189-4195. doi: 10.1093/bioinformatics/bty500.
10
BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs.BrownieAligner:Illumina 测序数据到 de Bruijn 图的精确比对。
BMC Bioinformatics. 2018 Sep 4;19(1):311. doi: 10.1186/s12859-018-2319-7.

引用本文的文献

1
Pangenome comparison via ED strings.通过编辑距离字符串进行泛基因组比较。
Front Bioinform. 2024 Sep 26;4:1397036. doi: 10.3389/fbinf.2024.1397036. eCollection 2024.
2
Haplotype-aware sequence alignment to pangenome graphs.基于单倍型感知的序列比对到泛基因组图谱。
Genome Res. 2024 Oct 11;34(9):1265-1275. doi: 10.1101/gr.279143.124.
3
Label-guided seed-chain-extend alignment on annotated De Bruijn graphs.带标签的种子链扩展对齐标注的 De Bruijn 图。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i337-i346. doi: 10.1093/bioinformatics/btae226.
4
Chaining for accurate alignment of erroneous long reads to acyclic variation graphs.基于无环变异图的错误长读精确比对链。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad460.