Nicolae Marius, Rajasekaran Sanguthevar
Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way Unit 4155, Storrs, CT 06269, USA.
Inf Process Lett. 2017 Feb;118:78-82. doi: 10.1016/j.ipl.2016.10.003. Epub 2016 Oct 27.
We consider the problem of pattern matching with mismatches, where there can be don't care or wild card characters in the pattern. Specifically, given a pattern of length and a text of length , we want to find all occurrences of in that have no more than mismatches. The pattern can have don't care characters, which match any character. Without don't cares, the best known algorithm for pattern matching with mismatches has a runtime of [Formula: see text]. With don't cares in the pattern, the best deterministic algorithm has a runtime of ( polylog ). Therefore, there is an important gap between the versions with and without don't cares. In this paper we give an algorithm whose runtime increases with the number of don't cares. We define an to be a maximal length substring of that does not contain don't cares. Let be the number of islands in . We present an algorithm that runs in [Formula: see text] time. If the number of islands is () this runtime becomes [Formula: see text], which essentially matches the best known runtime for pattern matching with mismatches without don't cares. If the number of islands is (), this algorithm is asymptotically faster than the previous best algorithm for pattern matching with mismatches with don't cares in the pattern.
我们考虑带错配的模式匹配问题,其中模式中可能存在无关或通配符字符。具体来说,给定一个长度为(m)的模式和一个长度为(n)的文本,我们想找出文本中所有与模式匹配且错配不超过(k)次的出现位置。模式中可以有无关字符,它能匹配任何字符。在没有无关字符的情况下,用于带错配的模式匹配的最著名算法运行时间为[公式:见原文]。当模式中有无关字符时,最佳确定性算法的运行时间为(多项对数)。因此,带无关字符和不带无关字符的版本之间存在重要差距。在本文中,我们给出一种算法,其运行时间随无关字符的数量增加。我们将(I)定义为模式中不包含无关字符的最大长度子串。设(I)中的岛的数量为(s)。我们提出一种在[公式:见原文]时间内运行的算法。如果岛的数量(s)为((\log n)),此运行时间变为[公式:见原文],这基本上与不带无关字符的带(k)次错配的模式匹配的最著名运行时间相匹配。如果岛的数量(s)为((\log\log n)),该算法在渐近意义上比之前用于带模式中无关字符的带(k)次错配的模式匹配的最佳算法更快。