使用贪婪逐阶段算法训练硬间隔支持向量机。

Training hard-margin support vector machines using greedy stagewise algorithm.

作者信息

Bo Liefeng, Wang Ling, Jiao Licheng

机构信息

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, the Institute of Intelligent Information Processing, Xidian University, Xi'an, Shaanxi 710071, P. R. China. blf0218@ 163.com

出版信息

IEEE Trans Neural Netw. 2008 Aug;19(8):1446-55. doi: 10.1109/TNN.2008.2000576.

DOI:10.1109/TNN.2008.2000576

PMID:18701373

Abstract

Hard-margin support vector machines (HM-SVMs) suffer from getting overfitting in the presence of noise. Soft-margin SVMs deal with this problem by introducing a regularization term and obtain a state-of-the-art performance. However, this disposal leads to a relatively high computational cost. In this paper, an alternative method, greedy stagewise algorithm for SVMs, named GS-SVMs, is presented to cope with the overfitting of HM-SVMs without employing the regularization term. The most attractive property of GS-SVMs is that its computational complexity in the worst case only scales quadratically with the size of training samples. Experiments on the large data sets with up to 400,000 training samples demonstrate that GS-SVMs can be faster than LIBSVM 2.83 without sacrificing the accuracy. Finally, we employ statistical learning theory to analyze the empirical results, which shows that the success of GS-SVMs lies in that its early stopping rule can act as an implicit regularization term.

摘要

硬间隔支持向量机（HM - SVM）在存在噪声的情况下容易出现过拟合问题。软间隔支持向量机通过引入正则化项来处理这个问题，并取得了最优的性能。然而，这种处理方式会导致相对较高的计算成本。在本文中，我们提出了一种替代方法，即用于支持向量机的贪婪逐步算法（GS - SVM），以在不使用正则化项的情况下应对HM - SVM的过拟合问题。GS - SVM最吸引人的特性是其在最坏情况下的计算复杂度仅与训练样本大小成二次方关系。在具有多达400,000个训练样本的大数据集上进行的实验表明，GS - SVM在不牺牲准确性的情况下可以比LIBSVM 2.83更快。最后，我们运用统计学习理论来分析实证结果，结果表明GS - SVM的成功之处在于其提前停止规则可以起到隐式正则化项的作用。