Vahed A, Omlin C W
Department of Computer Science, University of the Western Cape, Bellville, South Africa.
Neural Comput. 2004 Jan;16(1):59-71. doi: 10.1162/08997660460733994.
Neural networks do not readily provide an explanation of the knowledge stored in their weights as part of their information processing. Until recently, neural networks were considered to be black boxes, with the knowledge stored in their weights not readily accessible. Since then, research has resulted in a number of algorithms for extracting knowledge in symbolic form from trained neural networks. This article addresses the extraction of knowledge in symbolic form from recurrent neural networks trained to behave like deterministic finite-state automata (DFAs). To date, methods used to extract knowledge from such networks have relied on the hypothesis that networks' states tend to cluster and that clusters of network states correspond to DFA states. The computational complexity of such a cluster analysis has led to heuristics that either limit the number of clusters that may form during training or limit the exploration of the space of hidden recurrent state neurons. These limitations, while necessary, may lead to decreased fidelity, in which the extracted knowledge may not model the true behavior of a trained network, perhaps not even for the training set. The method proposed here uses a polynomial time, symbolic learning algorithm to infer DFAs solely from the observation of a trained network's input-output behavior. Thus, this method has the potential to increase the fidelity of the extracted knowledge.
神经网络在其信息处理过程中,不容易对存储在其权重中的知识做出解释。直到最近,神经网络还被视为黑箱,存储在其权重中的知识难以获取。从那时起,研究产生了许多从经过训练的神经网络中提取符号形式知识的算法。本文讨论从训练为表现得像确定性有限状态自动机(DFA)的递归神经网络中提取符号形式的知识。迄今为止,用于从此类网络中提取知识的方法依赖于这样的假设:网络状态倾向于聚类,并且网络状态的聚类对应于DFA状态。这种聚类分析的计算复杂性导致了启发式方法,这些方法要么限制训练期间可能形成的聚类数量,要么限制对隐藏递归状态神经元空间的探索。这些限制虽然是必要的,但可能会导致保真度降低,即提取的知识可能无法对经过训练的网络的真实行为进行建模,甚至可能对训练集也无法建模。这里提出的方法使用多项式时间符号学习算法,仅从对经过训练的网络的输入 - 输出行为的观察中推断DFA。因此,这种方法有可能提高提取知识的保真度。