Suppr超能文献

近似循环字符串匹配的快速算法。

Fast algorithms for approximate circular string matching.

作者信息

Barton Carl, Iliopoulos Costas S, Pissis Solon P

机构信息

King's College London, London, UK.

出版信息

Algorithms Mol Biol. 2014 Mar 22;9(1):9. doi: 10.1186/1748-7188-9-9.

Abstract

BACKGROUND

Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area.

RESULTS

In this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time O(n). Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time O(n) for moderate values of k, that is k=O(m/logm). We show how the same results can be easily obtained under the edit distance model. The presented algorithms are also implemented as library functions. Experimental results demonstrate that the functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naïve approach.

CONCLUSIONS

We present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at http://www.inf.kcl.ac.uk/research/projects/asmf/.

摘要

背景

循环字符串匹配是一个在许多生物学情境中自然出现的问题。它在于在长度为n的文本中找到长度为m的模式的所有旋转出现情况。存在用于精确循环字符串匹配的最优平均情况算法。近似循环字符串匹配是一个相当未充分发展的领域。

结果

在本文中,我们提出一种用于精确循环字符串匹配的次优平均情况算法,其时间复杂度为O(n)。基于我们对精确情况的解决方案,我们提出两种用于在汉明距离模型下具有k个错配的近似循环字符串匹配的快速平均情况算法,对于适度的k值,即k = O(m / log m),时间复杂度为O(n)。我们展示了在编辑距离模型下如何轻松获得相同的结果。所提出的算法也被实现为库函数。实验结果表明,与朴素方法相比,该库中提供的函数将计算速度提高了三个多数量级。

结论

我们提出了两种用于具有k个错配的近似循环字符串匹配的快速平均情况算法;并表明它们在实践中也表现得非常好。我们贡献的重要性体现在所提供的函数可以无缝集成到任何生物学流程这一事实上。该库的源代码可在http://www.inf.kcl.ac.uk/research/projects/asmf/免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0284/4234210/81bd1f10e58a/1748-7188-9-9-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验