Lucas Simon M, Reynolds T Jeff
Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, Essex C04 35Q, UK.
IEEE Trans Pattern Anal Mach Intell. 2005 Jul;27(7):1063-74. doi: 10.1109/TPAMI.2005.143.
Learning a Deterministic Finite Automaton (DFA) from a training set of labeled strings is a hard task that has been much studied within the machine learning community. It is equivalent to learning a regular language by example and has applications in language modeling. In this paper, we describe a novel evolutionary method for learning DFA that evolves only the transition matrix and uses a simple deterministic procedure to optimally assign state labels. We compare its performance with the Evidence Driven State Merging (EDSM) algorithm, one of the most powerful known DFA learning algorithms. We present results on random DFA induction problems of varying target size and training set density. We also studythe effects of noisy training data on the evolutionary approach and on EDSM. On noise-free data, we find that our evolutionary method outperforms EDSM on small sparse data sets. In the case of noisy training data, we find that our evolutionary method consistently outperforms EDSM, as well as other significant methods submitted to two recent competitions.
从带标签字符串的训练集中学习确定有限自动机(DFA)是一项艰巨的任务,机器学习社区对此进行了大量研究。它等同于通过示例学习正则语言,并在语言建模中有应用。在本文中,我们描述了一种学习DFA的新颖进化方法,该方法仅进化转移矩阵,并使用简单的确定性过程来最优地分配状态标签。我们将其性能与证据驱动状态合并(EDSM)算法进行比较,EDSM算法是已知最强大的DFA学习算法之一。我们给出了不同目标大小和训练集密度的随机DFA归纳问题的结果。我们还研究了有噪声训练数据对进化方法和EDSM的影响。在无噪声数据上,我们发现在小的稀疏数据集上我们的进化方法优于EDSM。在有噪声训练数据的情况下,我们发现我们的进化方法始终优于EDSM以及提交到最近两次竞赛的其他重要方法。