Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur.
College of Computer and Information technology, Shaqra university, Saudi Arabia.
PLoS One. 2018 Jul 26;13(7):e0200912. doi: 10.1371/journal.pone.0200912. eCollection 2018.
Exact pattern matching algorithms are popular and used widely in several applications, such as molecular biology, text processing, image processing, web search engines, network intrusion detection systems and operating systems. The focus of these algorithms is to achieve time efficiency according to applications but not memory consumption. In this work, we propose a novel idea to achieve both time efficiency and memory consumption by splitting query string for searching in Corpus. For a given text, the proposed algorithm split the query pattern into two equal halves and considers the second (right) half as a query string for searching in Corpus. Once the match is found with second halves, the proposed algorithm applies brute force procedure to find remaining match by referring the location of right half. Experimental results on different S1 Dataset, namely Arabic, English, Chinese, Italian and French text databases show that the proposed algorithm outperforms the existing S1 Algorithm in terms of time efficiency and memory consumption as the length of the query pattern increases.
精确模式匹配算法在多个应用中很受欢迎,应用广泛,例如分子生物学、文本处理、图像处理、网络搜索引擎、网络入侵检测系统和操作系统。这些算法的重点是根据应用程序实现时间效率,而不是内存消耗。在这项工作中,我们提出了一种新的想法,通过分割查询字符串来在语料库中实现时间效率和内存消耗的兼顾。对于给定的文本,所提出的算法将查询模式分成两个相等的部分,并将第二部分(右半部分)视为在语料库中进行搜索的查询字符串。一旦找到第二部分的匹配,所提出的算法通过引用右半部分的位置,应用暴力搜索程序来找到其余的匹配。在不同的 S1 数据集上的实验结果,即阿拉伯语、英语、中文、意大利语和法语文本数据库,表明随着查询模式长度的增加,所提出的算法在时间效率和内存消耗方面优于现有的 S1 算法。