IEEE Trans Cybern. 2014 Jan;44(1):1-20. doi: 10.1109/TSMCB.2012.2236828. Epub 2013 Jun 18.
Kernel methods such as the standard support vector machine and support vector regression trainings take O(N(3)) time and O(N(2)) space complexities in their naïve implementations, where N is the training set size. It is thus computationally infeasible in applying them to large data sets, and a replacement of the naive method for finding the quadratic programming (QP) solutions is highly desirable. By observing that many kernel methods can be linked up with kernel density estimate (KDE) which can be efficiently implemented by some approximation techniques, a new learning method called fast KDE (FastKDE) is proposed to scale up kernel methods. It is based on establishing a connection between KDE and the QP problems formulated for kernel methods using an entropy-based integrated-squared-error criterion. As a result, FastKDE approximation methods can be applied to solve these QP problems. In this paper, the latest advance in fast data reduction via KDE is exploited. With just a simple sampling strategy, the resulted FastKDE method can be used to scale up various kernel methods with a theoretical guarantee that their performance does not degrade a lot. It has a time complexity of O(m(3)) where m is the number of the data points sampled from the training set. Experiments on different benchmarking data sets demonstrate that the proposed method has comparable performance with the state-of-art method and it is effective for a wide range of kernel methods to achieve fast learning in large data sets.
核方法,如标准支持向量机和支持向量回归训练,在其朴素实现中,时间复杂度为 O(N(3)),空间复杂度为 O(N(2)),其中 N 是训练集的大小。因此,将其应用于大数据集是不切实际的,非常需要一种替代朴素方法来寻找二次规划 (QP) 解的方法。通过观察到许多核方法可以与核密度估计 (KDE) 联系起来,而 KDE 可以通过一些近似技术有效地实现,因此提出了一种称为快速 KDE (FastKDE) 的新学习方法来扩展核方法。它基于在基于熵的积分均方误差准则下,在 KDE 和为核方法制定的 QP 问题之间建立联系。结果,可以应用 FastKDE 近似方法来解决这些 QP 问题。在本文中,利用了通过 KDE 进行快速数据缩减的最新进展。通过简单的采样策略,所得到的 FastKDE 方法可以用于扩展各种核方法,理论上保证其性能不会大幅下降。它的时间复杂度为 O(m(3)),其中 m 是从训练集中采样的数据点的数量。在不同基准数据集上的实验表明,该方法具有与最先进方法相当的性能,并且对于广泛的核方法在大数据集中实现快速学习是有效的。