Suppr超能文献

带重复和收缩的高效编辑距离

Efficient edit distance with duplications and contractions.

作者信息

Pinhas Tamar, Zakov Shay, Tsur Dekel, Ziv-Ukelson Michal

机构信息

Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel.

Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, USA.

出版信息

Algorithms Mol Biol. 2013 Oct 29;8(1):27. doi: 10.1186/1748-7188-8-27.

Abstract

: We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, MP(n)=On3log3lognlog2n due to an algorithm by Chan).For integer cost functions, the running time is further improved to O|Σ|n3log2n. In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1log|D|n<λ<1, the algorithm preprocesses an n × n matrix in On2+λ|D| time and On2+λ|D|λ2log|D|2n space. Then, it may multiply the matrix with any given n-length vector in On2λ2log|D|2n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest On3log2n time algorithm for single-strand RNA folding with discrete cost functions.Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O|Σ|nMP(ñ)ñ time and O(|Σ|nñ) space, where ñ is the length of the run-length encoding of the strings.

摘要

我们提出了三种用于带重复和收缩的字符串编辑距离的算法。其中包括一种高效的通用算法以及在成本函数的某些约束下适用的两种改进算法。新算法解决了一个更通用的问题变体,并且相对于先前的算法获得了更好的时间复杂度。我们的通用算法基于方阵的最小加乘法,时间和空间复杂度分别为O (|Σ|MP (n)) 和O (|Σ|n2),其中|Σ| 是字母表大小,n 是字符串的长度,MP (n) 是两个n×n 矩阵的最小加矩阵乘法计算的时间界限(目前,由于Chan 的算法,MP(n)=On3log3lognlog2n)。对于整数值成本函数,运行时间进一步改进为O|Σ|n3log2n。此外,该算法的这种变体是在线的,即输入字符串可以逐个字母给出,并且其时间复杂度限制了前n 个给定字母的处理时间。这种加速基于我们高效的矩阵 - 向量最小加乘法算法,该算法适用于相邻项之间的差异来自有限整数区间D 的矩阵和向量。选择一个常数1log|D|n<λ<1,该算法在On2+λ|D| 时间和On2+λ|D|λ2log|D|2n 空间中预处理一个n×n 矩阵。然后,它可以在On2λ2log|D|2n 时间内将该矩阵与任何给定的n 长度向量相乘。在一些离散性假设下,这种矩阵 - 向量最小加乘法算法适用于上下文无关语法解析和RNA 折叠领域的几个问题,特别是意味着具有离散成本函数的单链RNA 折叠的渐近最快的On3log2n 时间算法。最后,假设对成本函数有不同的约束,我们提出了该算法的另一个版本,该版本利用字符串的游程编码,运行时间为O|Σ|nMP(ñ)ñ,空间为O(|Σ|nñ),其中ñ 是字符串的游程编码的长度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b68e/3879238/4fbddda64537/1748-7188-8-27-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验