Ozawa Seiichi, Pang Shaoning, Kasabov Nikola
Graduate School of Engineering, Kobe University, Nada-ku, Kobe 657-8501, Japan.
IEEE Trans Neural Netw. 2008 Jun;19(6):1061-74. doi: 10.1109/TNN.2007.2000059.
This paper presents a pattern classification system in which feature extraction and classifier learning are simultaneously carried out not only online but also in one pass where training samples are presented only once. For this purpose, we have extended incremental principal component analysis (IPCA) and some classifier models were effectively combined with it. However, there was a drawback in this approach that training samples must be learned one by one due to the limitation of IPCA. To overcome this problem, we propose another extension of IPCA called chunk IPCA in which a chunk of training samples is processed at a time. In the experiments, we evaluate the classification performance for several large-scale data sets to discuss the scalability of chunk IPCA under one-pass incremental learning environments. The experimental results suggest that chunk IPCA can reduce the training time effectively as compared with IPCA unless the number of input attributes is too large. We study the influence of the size of initial training data and the size of given chunk data on classification accuracy and learning time. We also show that chunk IPCA can obtain major eigenvectors with fairly good approximation.
本文提出了一种模式分类系统,其中特征提取和分类器学习不仅在线同时进行,而且在训练样本仅呈现一次的单遍过程中进行。为此,我们扩展了增量主成分分析(IPCA),并有效地将一些分类器模型与之结合。然而,这种方法存在一个缺点,由于IPCA的局限性,训练样本必须逐个学习。为了克服这个问题,我们提出了IPCA的另一种扩展,称为分块IPCA,其中一次处理一块训练样本。在实验中,我们评估了几个大规模数据集的分类性能,以讨论在单遍增量学习环境下分块IPCA的可扩展性。实验结果表明,与IPCA相比,除非输入属性的数量太大,分块IPCA可以有效地减少训练时间。我们研究了初始训练数据的大小和给定分块数据的大小对分类准确率和学习时间的影响。我们还表明,分块IPCA可以获得具有相当好近似度的主要特征向量。