基于一种新型核密度估计算法的径向基函数网络数据分类

Data classification with radial basis function networks based on a novel kernel density estimation algorithm.

作者信息

Oyang Yen-Jen, Hwang Shien-Ching, Ou Yu-Yen, Chen Chien-Yu, Chen Zhi-Wei

机构信息

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, 106, Taiwan, ROC.

出版信息

IEEE Trans Neural Netw. 2005 Jan;16(1):225-36. doi: 10.1109/TNN.2004.836229.

DOI:10.1109/TNN.2004.836229

PMID:15732402

Abstract

This paper presents a novel learning algorithm for efficient construction of the radial basis function (RBF) networks that can deliver the same level of accuracy as the support vector machines (SVMs) in data classification applications. The proposed learning algorithm works by constructing one RBF subnetwork to approximate the probability density function of each class of objects in the training data set. With respect to algorithm design, the main distinction of the proposed learning algorithm is the novel kernel density estimation algorithm that features an average time complexity of O(n log n), where n is the number of samples in the training data set. One important advantage of the proposed learning algorithm, in comparison with the SVM, is that the proposed learning algorithm generally takes far less time to construct a data classifier with an optimized parameter setting. This feature is of significance for many contemporary applications, in particular, for those applications in which new objects are continuously added into an already large database. Another desirable feature of the proposed learning algorithm is that the RBF networks constructed are capable of carrying out data classification with more than two classes of objects in one single run. In other words, unlike with the SVM, there is no need to resort to mechanisms such as one-against-one or one-against-all for handling datasets with more than two classes of objects. The comparison with SVM is of particular interest, because it has been shown in a number of recent studies that SVM generally are able to deliver higher classification accuracy than the other existing data classification algorithms. As the proposed learning algorithm is instance-based, the data reduction issue is also addressed in this paper. One interesting observation in this regard is that, for all three data sets used in data reduction experiments, the number of training samples remaining after a naive data reduction mechanism is applied is quite close to the number of support vectors identified by the SVM software. This paper also compares the performance of the RBF networks constructed with the proposed learning algorithm and those constructed with a conventional cluster-based learning algorithm. The most interesting observation learned is that, with respect to data classification, the distributions of training samples near the boundaries between different classes of objects carry more crucial information than the distributions of samples in the inner parts of the clusters.

摘要

本文提出了一种新颖的学习算法，用于高效构建径向基函数（RBF）网络，该网络在数据分类应用中能够达到与支持向量机（SVM）相同的精度水平。所提出的学习算法通过构建一个RBF子网来近似训练数据集中每类对象的概率密度函数。在算法设计方面，所提出的学习算法的主要区别在于新颖的核密度估计算法，其平均时间复杂度为O(n log n)，其中n是训练数据集中的样本数量。与SVM相比，所提出的学习算法的一个重要优点是，在优化参数设置的情况下，构建数据分类器通常所需的时间要少得多。这一特性对于许多当代应用具有重要意义，特别是对于那些不断有新对象添加到已有大型数据库中的应用。所提出的学习算法的另一个理想特性是，构建的RBF网络能够在一次运行中对两类以上的对象进行数据分类。换句话说，与SVM不同，对于两类以上对象的数据集，无需采用一对一对抗或一对多对抗等机制。与SVM的比较特别令人感兴趣，因为最近的一些研究表明，SVM通常比其他现有数据分类算法能够提供更高的分类精度。由于所提出的学习算法是基于实例的，本文还讨论了数据约简问题。在这方面一个有趣的发现是，对于数据约简实验中使用的所有三个数据集，应用简单的数据约简机制后剩余的训练样本数量与SVM软件识别的支持向量数量相当接近。本文还比较了用所提出的学习算法构建的RBF网络与用传统基于聚类的学习算法构建的RBF网络的性能。最有趣的发现是，在数据分类方面，不同类对象边界附近的训练样本分布比聚类内部的样本分布携带更关键的信息。