• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

带重复和收缩的高效编辑距离

Efficient edit distance with duplications and contractions.

作者信息

Pinhas Tamar, Zakov Shay, Tsur Dekel, Ziv-Ukelson Michal

机构信息

Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel.

Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, USA.

出版信息

Algorithms Mol Biol. 2013 Oct 29;8(1):27. doi: 10.1186/1748-7188-8-27.

DOI:10.1186/1748-7188-8-27
PMID:24168705
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3879238/
Abstract

: We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, MP(n)=On3log3lognlog2n due to an algorithm by Chan).For integer cost functions, the running time is further improved to O|Σ|n3log2n. In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1log|D|n<λ<1, the algorithm preprocesses an n × n matrix in On2+λ|D| time and On2+λ|D|λ2log|D|2n space. Then, it may multiply the matrix with any given n-length vector in On2λ2log|D|2n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest On3log2n time algorithm for single-strand RNA folding with discrete cost functions.Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O|Σ|nMP(ñ)ñ time and O(|Σ|nñ) space, where ñ is the length of the run-length encoding of the strings.

摘要

我们提出了三种用于带重复和收缩的字符串编辑距离的算法。其中包括一种高效的通用算法以及在成本函数的某些约束下适用的两种改进算法。新算法解决了一个更通用的问题变体,并且相对于先前的算法获得了更好的时间复杂度。我们的通用算法基于方阵的最小加乘法,时间和空间复杂度分别为O (|Σ|MP (n)) 和O (|Σ|n2),其中|Σ| 是字母表大小,n 是字符串的长度,MP (n) 是两个n×n 矩阵的最小加矩阵乘法计算的时间界限(目前,由于Chan 的算法,MP(n)=On3log3lognlog2n)。对于整数值成本函数,运行时间进一步改进为O|Σ|n3log2n。此外,该算法的这种变体是在线的,即输入字符串可以逐个字母给出,并且其时间复杂度限制了前n 个给定字母的处理时间。这种加速基于我们高效的矩阵 - 向量最小加乘法算法,该算法适用于相邻项之间的差异来自有限整数区间D 的矩阵和向量。选择一个常数1log|D|n<λ<1,该算法在On2+λ|D| 时间和On2+λ|D|λ2log|D|2n 空间中预处理一个n×n 矩阵。然后,它可以在On2λ2log|D|2n 时间内将该矩阵与任何给定的n 长度向量相乘。在一些离散性假设下,这种矩阵 - 向量最小加乘法算法适用于上下文无关语法解析和RNA 折叠领域的几个问题,特别是意味着具有离散成本函数的单链RNA 折叠的渐近最快的On3log2n 时间算法。最后,假设对成本函数有不同的约束,我们提出了该算法的另一个版本,该版本利用字符串的游程编码,运行时间为O|Σ|nMP(ñ)ñ,空间为O(|Σ|nñ),其中ñ 是字符串的游程编码的长度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/a9cdb659581e/1748-7188-8-27-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/4fbddda64537/1748-7188-8-27-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/ab07c47261db/1748-7188-8-27-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/a066da64c8c7/1748-7188-8-27-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/07ed50e1a58b/1748-7188-8-27-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/a9cdb659581e/1748-7188-8-27-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/4fbddda64537/1748-7188-8-27-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/ab07c47261db/1748-7188-8-27-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/a066da64c8c7/1748-7188-8-27-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/07ed50e1a58b/1748-7188-8-27-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/a9cdb659581e/1748-7188-8-27-5.jpg

相似文献

1
Efficient edit distance with duplications and contractions.带重复和收缩的高效编辑距离
Algorithms Mol Biol. 2013 Oct 29;8(1):27. doi: 10.1186/1748-7188-8-27.
2
Linear space string correction algorithm using the Damerau-Levenshtein distance.基于 Damerau-Levenshtein 距离的线性空间字符串校正算法。
BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):4. doi: 10.1186/s12859-019-3184-8.
3
Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model.快速精确算法求解最接近字符串和子字符串问题及其在 (L, d)-基序模型中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1400-10. doi: 10.1109/TCBB.2011.21.
4
Improved algorithms for approximate string matching (extended abstract).用于近似字符串匹配的改进算法(扩展摘要)。
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2105-10-S1-S10.
5
String correction using the Damerau-Levenshtein distance.使用 Damerau-Levenshtein 距离进行字符串校正。
BMC Bioinformatics. 2019 Jun 6;20(Suppl 11):277. doi: 10.1186/s12859-019-2819-0.
6
A fast algorithm for the optimal alignment of three strings.一种用于三个字符串最优比对的快速算法。
J Theor Biol. 1993 Sep 21;164(2):261-9. doi: 10.1006/jtbi.1993.1153.
7
A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly.一种内存效率高的数据结构,用于表示精确匹配的重叠图,适用于下一代 DNA 组装。
Bioinformatics. 2011 Jul 15;27(14):1901-7. doi: 10.1093/bioinformatics/btr321. Epub 2011 Jun 2.
8
Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.
9
Short superstrings and the structure of overlapping strings.短超弦与重叠弦的结构
J Comput Biol. 1995 Summer;2(2):307-32. doi: 10.1089/cmb.1995.2.307.
10
Error Tree: A Tree Structure for Hamming and Edit Distances and Wildcards Matching.错误树:用于汉明距离、编辑距离和通配符匹配的树结构。
J Comput Biol. 2015 Dec;22(12):1118-28. doi: 10.1089/cmb.2015.0132. Epub 2015 Sep 24.

引用本文的文献

1
Approximating edit distances between complex tandem repeats efficiently.高效近似复杂串联重复序列之间的编辑距离。
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf155.
2
A landscape of complex tandem repeats within individual human genomes.个体人类基因组内复杂串联重复的景观。
Nat Commun. 2023 Sep 14;14(1):5530. doi: 10.1038/s41467-023-41262-1.
3
An improved Four-Russians method and sparsified Four-Russians algorithm for RNA folding.一种用于RNA折叠的改进型四俄罗斯人方法和稀疏化四俄罗斯人算法。

本文引用的文献

1
Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach.使用瓦利安特的方法,减少一类RNA和上下文无关文法问题的最坏情况运行时间。
Algorithms Mol Biol. 2011 Aug 18;6(1):20. doi: 10.1186/1748-7188-6-20.
2
A simple, practical and complete O(n3/log n)-time algorithm for RNA folding using the Four-Russians speedup.一种使用四俄罗斯人加速法的简单、实用且完整的 O(n3/log n) 时间复杂度的 RNA 折叠算法。
Algorithms Mol Biol. 2010 Jan 4;5:13. doi: 10.1186/1748-7188-5-13.
3
A fast and specific alignment method for minisatellite maps.
Algorithms Mol Biol. 2016 Aug 5;11:22. doi: 10.1186/s13015-016-0081-9. eCollection 2016.
一种快速且特异的微卫星图谱排列方法。
Evol Bioinform Online. 2007 Feb 22;2:303-20.
4
Alignment of minisatellite maps based on run-length encoding scheme.基于游程编码方案的小卫星图谱比对。
J Bioinform Comput Biol. 2009 Apr;7(2):287-308. doi: 10.1142/s0219720009004060.
5
Comparison of minisatellites.微卫星的比较。
J Comput Biol. 2003;10(3-4):357-72. doi: 10.1089/10665270360688066.
6
Y-chromosome-specific microsatellite mutation rates re-examined using a minisatellite, MSY1.使用小卫星MSY1重新审视Y染色体特异性微卫星突变率。
Hum Mol Genet. 1999 Oct;8(11):2117-20. doi: 10.1093/hmg/8.11.2117.
7
Fast algorithm for predicting the secondary structure of single-stranded RNA.预测单链RNA二级结构的快速算法
Proc Natl Acad Sci U S A. 1980 Nov;77(11):6309-13. doi: 10.1073/pnas.77.11.6309.
8
A general method applicable to the search for similarities in the amino acid sequence of two proteins.一种适用于寻找两种蛋白质氨基酸序列相似性的通用方法。
J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.