用于监督神经网络的新型最大间隔训练算法。

Novel maximum-margin training algorithms for supervised neural networks.

作者信息

Ludwig Oswaldo, Nunes Urbano

机构信息

ISR-Institute of Systems and Robotics, Department of Electrical and Computer Engineering, University of Coimbra Polo II, 3030-290 Coimbra, Portugal.

出版信息

IEEE Trans Neural Netw. 2010 Jun;21(6):972-84. doi: 10.1109/TNN.2010.2046423. Epub 2010 Apr 19.

DOI:10.1109/TNN.2010.2046423

PMID:20409990

Abstract

This paper proposes three novel training methods, two of them based on the backpropagation approach and a third one based on information theory for multilayer perceptron (MLP) binary classifiers. Both backpropagation methods are based on the maximal-margin (MM) principle. The first one, based on the gradient descent with adaptive learning rate algorithm (GDX) and named maximum-margin GDX (MMGDX), directly increases the margin of the MLP output-layer hyperplane. The proposed method jointly optimizes both MLP layers in a single process, backpropagating the gradient of an MM-based objective function, through the output and hidden layers, in order to create a hidden-layer space that enables a higher margin for the output-layer hyperplane, avoiding the testing of many arbitrary kernels, as occurs in case of support vector machine (SVM) training. The proposed MM-based objective function aims to stretch out the margin to its limit. An objective function based on Lp-norm is also proposed in order to take into account the idea of support vectors, however, overcoming the complexity involved in solving a constrained optimization problem, usually in SVM training. In fact, all the training methods proposed in this paper have time and space complexities O(N) while usual SVM training methods have time complexity O(N (3)) and space complexity O(N (2)) , where N is the training-data-set size. The second approach, named minimization of interclass interference (MICI), has an objective function inspired on the Fisher discriminant analysis. Such algorithm aims to create an MLP hidden output where the patterns have a desirable statistical distribution. In both training methods, the maximum area under ROC curve (AUC) is applied as stop criterion. The third approach offers a robust training framework able to take the best of each proposed training method. The main idea is to compose a neural model by using neurons extracted from three other neural networks, each one previously trained by MICI, MMGDX, and Levenberg-Marquard (LM), respectively. The resulting neural network was named assembled neural network (ASNN). Benchmark data sets of real-world problems have been used in experiments that enable a comparison with other state-of-the-art classifiers. The results provide evidence of the effectiveness of our methods regarding accuracy, AUC, and balanced error rate.

摘要

本文提出了三种新颖的训练方法，其中两种基于反向传播方法，第三种基于信息论，用于多层感知器（MLP）二元分类器。两种反向传播方法均基于最大间隔（MM）原则。第一种方法基于带有自适应学习率算法的梯度下降（GDX），名为最大间隔GDX（MMGDX），它直接增加MLP输出层超平面的间隔。该方法在单个过程中联合优化MLP的两层，通过输出层和隐藏层反向传播基于MM的目标函数的梯度，以创建一个隐藏层空间，使输出层超平面具有更大的间隔，避免了像支持向量机（SVM）训练那样测试许多任意核。所提出的基于MM的目标函数旨在将间隔扩展到极限。还提出了一个基于Lp范数的目标函数，以考虑支持向量的概念，然而，克服了通常在SVM训练中解决约束优化问题所涉及的复杂性。实际上，本文提出的所有训练方法的时间和空间复杂度均为O(N)，而通常的SVM训练方法的时间复杂度为O(N³)，空间复杂度为O(N²)，其中N是训练数据集的大小。第二种方法名为类间干扰最小化（MICI），其目标函数受Fisher判别分析启发。该算法旨在创建一个MLP隐藏输出，使模式具有理想的统计分布。在这两种训练方法中，均将ROC曲线下的最大面积（AUC）用作停止准则。第三种方法提供了一个强大的训练框架，能够充分利用每种提出的训练方法。主要思想是通过使用从其他三个神经网络中提取的神经元来构建一个神经模型，这三个神经网络分别预先由MICI、MMGDX和Levenberg-Marquard（LM）训练。所得的神经网络被命名为组合神经网络（ASNN）。在实验中使用了实际问题的基准数据集，以便与其他先进分类器进行比较。结果证明了我们的方法在准确性、AUC和平衡错误率方面的有效性。