Barton Carl, Iliopoulos Costas S, Pissis Solon P
King's College London, London, UK.
Algorithms Mol Biol. 2014 Mar 22;9(1):9. doi: 10.1186/1748-7188-9-9.
Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area.
In this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time O(n). Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time O(n) for moderate values of k, that is k=O(m/logm). We show how the same results can be easily obtained under the edit distance model. The presented algorithms are also implemented as library functions. Experimental results demonstrate that the functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naïve approach.
We present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at http://www.inf.kcl.ac.uk/research/projects/asmf/.
循环字符串匹配是一个在许多生物学情境中自然出现的问题。它在于在长度为n的文本中找到长度为m的模式的所有旋转出现情况。存在用于精确循环字符串匹配的最优平均情况算法。近似循环字符串匹配是一个相当未充分发展的领域。
在本文中,我们提出一种用于精确循环字符串匹配的次优平均情况算法,其时间复杂度为O(n)。基于我们对精确情况的解决方案,我们提出两种用于在汉明距离模型下具有k个错配的近似循环字符串匹配的快速平均情况算法,对于适度的k值,即k = O(m / log m),时间复杂度为O(n)。我们展示了在编辑距离模型下如何轻松获得相同的结果。所提出的算法也被实现为库函数。实验结果表明,与朴素方法相比,该库中提供的函数将计算速度提高了三个多数量级。
我们提出了两种用于具有k个错配的近似循环字符串匹配的快速平均情况算法;并表明它们在实践中也表现得非常好。我们贡献的重要性体现在所提供的函数可以无缝集成到任何生物学流程这一事实上。该库的源代码可在http://www.inf.kcl.ac.uk/research/projects/asmf/免费获取。