School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China; SYSU-CMU Shunde International Joint Research Institute, Shunde, China.
School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China; Guangdong Province Key Laboratory of Computational Science, Guangzhou 510275, China.
Neural Netw. 2015 Mar;63:117-32. doi: 10.1016/j.neunet.2014.11.003. Epub 2014 Nov 27.
Kernel competitive learning has been successfully used to achieve robust clustering. However, kernel competitive learning (KCL) is not scalable for large scale data processing, because (1) it has to calculate and store the full kernel matrix that is too large to be calculated and kept in the memory and (2) it cannot be computed in parallel. In this paper we develop a framework of approximate kernel competitive learning for processing large scale dataset. The proposed framework consists of two parts. First, it derives an approximate kernel competitive learning (AKCL), which learns kernel competitive learning in a subspace via sampling. We provide solid theoretical analysis on why the proposed approximation modelling would work for kernel competitive learning, and furthermore, we show that the computational complexity of AKCL is largely reduced. Second, we propose a pseudo-parallelled approximate kernel competitive learning (PAKCL) based on a set-based kernel competitive learning strategy, which overcomes the obstacle of using parallel programming in kernel competitive learning and significantly accelerates the approximate kernel competitive learning for large scale clustering. The empirical evaluation on publicly available datasets shows that the proposed AKCL and PAKCL can perform comparably as KCL, with a large reduction on computational cost. Also, the proposed methods achieve more effective clustering performance in terms of clustering precision against related approximate clustering approaches.
核竞争学习已成功用于实现鲁棒聚类。然而,核竞争学习(KCL)对于大规模数据处理来说是不可扩展的,因为 (1) 它必须计算和存储完整的核矩阵,而该矩阵太大,无法在内存中计算和存储;(2) 它不能并行计算。在本文中,我们为处理大规模数据集开发了一种近似核竞争学习框架。所提出的框架由两部分组成。首先,它导出了一种近似核竞争学习(AKCL),它通过采样在子空间中学习核竞争学习。我们提供了为什么所提出的近似模型将适用于核竞争学习的坚实理论分析,并且还表明 AKCL 的计算复杂度大大降低。其次,我们提出了一种基于基于集合的核竞争学习策略的伪并行近似核竞争学习(PAKCL),它克服了在核竞争学习中使用并行编程的障碍,并显著加速了大规模聚类的近似核竞争学习。在公开可用的数据集上的实验评估表明,所提出的 AKCL 和 PAKCL 可以与 KCL 相媲美,同时大大降低了计算成本。此外,所提出的方法在聚类精度方面相对于相关的近似聚类方法实现了更有效的聚类性能。