IEEE Trans Neural Netw Learn Syst. 2018 Apr;29(4):792-806. doi: 10.1109/TNNLS.2017.2648038. Epub 2017 Jan 20.
Reducing samples through convex hull vertices selection (CHVS) within each class is an important and effective method for online classification problems, since the classifier can be trained rapidly with the selected samples. However, the process of CHVS is NP-hard. In this paper, we propose a fast algorithm to select the convex hull vertices, based on the convex hull decomposition and the property of projection. In the proposed algorithm, the quadratic minimization problem of computing the distance between a point and a convex hull is converted into a linear equation problem with a low computational complexity. When the data dimension is high, an approximate, instead of exact, convex hull is allowed to be selected by setting an appropriate termination condition in order to delete more nonimportant samples. In addition, the impact of outliers is also considered, and the proposed algorithm is improved by deleting the outliers in the initial procedure. Furthermore, a dimension convention technique via the kernel trick is used to deal with nonlinearly separable problems. An upper bound is theoretically proved for the difference between the support vector machines based on the approximate convex hull vertices selected and all the training samples. Experimental results on both synthetic and real data sets show the effectiveness and validity of the proposed algorithm.
通过在每个类别中选择凸壳顶点(CHVS)来减少样本量,是解决在线分类问题的一种重要且有效的方法,因为分类器可以使用所选样本快速训练。然而,CHVS 的过程是 NP 难的。在本文中,我们提出了一种快速算法,基于凸壳分解和投影的性质来选择凸壳顶点。在提出的算法中,计算点与凸壳之间距离的二次最小化问题被转换为具有低计算复杂度的线性方程问题。当数据维度较高时,可以通过设置适当的终止条件来允许选择近似而不是精确的凸壳,以便删除更多不重要的样本。此外,还考虑了异常值的影响,并通过在初始过程中删除异常值来改进所提出的算法。此外,通过核技巧的维度约定技术来处理非线性可分问题。从理论上证明了基于所选近似凸壳顶点和所有训练样本的支持向量机之间的差异的上限。在合成数据集和真实数据集上的实验结果表明了所提出算法的有效性和有效性。