Tian Yingjie, Ju Xuchan, Shi Yong
Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China.
Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 101408, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China.
Neural Netw. 2016 Mar;75:12-21. doi: 10.1016/j.neunet.2015.11.008. Epub 2015 Nov 27.
Nonparallel Support Vector Machine (NPSVM) which is more flexible and has better generalization than typical SVM is widely used for classification. Although some methods and toolboxes like SMO and libsvm for NPSVM are used, NPSVM is hard to scale up when facing millions of samples. In this paper, we propose a divide-and-combine method for large scale nonparallel support vector machine (DCNPSVM). In the division step, DCNPSVM divide samples into smaller sub-samples aiming at solving smaller subproblems independently. We theoretically and experimentally prove that the objective function value, solutions, and support vectors solved by DCNPSVM are close to the objective function value, solutions, and support vectors of the whole NPSVM problem. In the combination step, the sub-solutions combined as initial iteration points are used to solve the whole problem by global coordinate descent which converges quickly. In order to balance the accuracy and efficiency, we adopt a multi-level structure which outperforms state-of-the-art methods. Moreover, our DCNPSVM can tackle unbalance problems efficiently by tuning the parameters. Experimental results on lots of large data sets show the effectiveness of our method in memory usage, classification accuracy and time consuming.
非平行支持向量机(NPSVM)比典型的支持向量机更灵活且具有更好的泛化能力,被广泛用于分类。尽管使用了一些针对NPSVM的方法和工具箱,如SMO和libsvm,但当面对数百万个样本时,NPSVM很难进行扩展。在本文中,我们提出了一种用于大规模非平行支持向量机的分而组合方法(DCNPSVM)。在划分步骤中,DCNPSVM将样本划分为较小的子样本,旨在独立解决较小的子问题。我们从理论和实验上证明,DCNPSVM求解得到的目标函数值、解和支持向量接近于整个NPSVM问题的目标函数值、解和支持向量。在组合步骤中,作为初始迭代点组合的子解通过收敛速度快的全局坐标下降法来求解整个问题。为了平衡准确性和效率,我们采用了一种优于现有方法的多级结构。此外,我们的DCNPSVM可以通过调整参数有效地解决不平衡问题。在大量大数据集上的实验结果表明了我们的方法在内存使用、分类准确性和耗时方面的有效性。