He Li, Zhu Haifei, Zhang Tao, Yang Honghong, Guan Yisheng
Department of Electromechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China.
Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada.
Entropy (Basel). 2018 Jul 10;20(7):519. doi: 10.3390/e20070519.
In kernel methods, Nyström approximation is a popular way of calculating out-of-sample extensions and can be further applied to large-scale data clustering and classification tasks. Given a new data point, Nyström employs its empirical affinity vector, , for calculation. This vector is assumed to be a proper measurement of the similarity between the new point and the training set. In this paper, we suggest replacing the affinity vector by its projections on the leading eigenvectors learned from the training set, i.e., using k*=∑i=1ckTuiui instead, where ui is the -th eigenvector of the training set and is the number of eigenvectors used, which is typically equal to the number of classes designed by users. Our work is motivated by the constraints that in kernel space, the kernel-mapped new point should (a) also lie on the unit sphere defined by the Gaussian kernel and (b) generate training set affinity values close to . These two constraints define a Quadratic Optimization Over a Sphere (QOOS) problem. In this paper, we prove that the projection on the leading eigenvectors, rather than the original affinity vector, is the solution to the QOOS problem. The experimental results show that the proposed replacement of by k* slightly improves the performance of the Nyström approximation. Compared with other affinity matrix modification methods, our k* obtains comparable or higher clustering performance in terms of accuracy and Normalized Mutual Information (NMI).
在内核方法中,Nyström 近似是计算样本外扩展的一种常用方法,并且可以进一步应用于大规模数据聚类和分类任务。给定一个新的数据点,Nyström 使用其经验亲和向量 进行计算。该向量被假定为新点与训练集之间相似度的合适度量。在本文中,我们建议用其在从训练集中学习到的主导特征向量上的投影来代替亲和向量,即改为使用 k*=∑i=1ckTuiui,其中 ui 是训练集的第 i 个特征向量, 是所使用特征向量的数量,通常等于用户设计的类别数量。我们的工作受到以下约束的启发:在核空间中,核映射后的新点应 (a) 也位于由高斯核定义的单位球面上,并且 (b) 生成接近 的训练集亲和值。这两个约束定义了一个球面上的二次优化 (QOOS) 问题。在本文中,我们证明在主导特征向量上的投影,而不是原始的亲和向量,是 QOOS 问题的解。实验结果表明,用 k* 代替 略微提高了 Nyström 近似的性能。与其他亲和矩阵修改方法相比,我们的 k* 在准确性和归一化互信息 (NMI) 方面获得了相当或更高的聚类性能。