Wang Fei, Zhao Bin, Zhang Changshui
Department of Automation, Tsinghua University, Beijing, China.
IEEE Trans Neural Netw. 2010 Feb;21(2):319-32. doi: 10.1109/TNN.2009.2036998. Epub 2010 Jan 15.
Maximum margin clustering (MMC) is a newly proposed clustering method which has shown promising performance in recent studies. It extends the computational techniques of support vector machine (SVM) to the unsupervised scenario. Traditionally, MMC is formulated as a nonconvex integer programming problem which makes it difficult to solve. Several methods have been proposed in the literature to solve the MMC problem based on either semidefinite programming (SDP) or alternating optimization. However, these methods are still time demanding when handling large scale data sets, which limits its application in real-world problems. In this paper, we propose a cutting plane maximum margin clustering (CPMMC) algorithm. It first decomposes the nonconvex MMC problem into a series of convex subproblems by making use of the constrained concave-convex procedure (CCCP), then for each subproblem, our algorithm adopts the cutting plane algorithm to solve it. Moreover, we show that the CPMMC algorithm takes O(sn) time to converge with guaranteed accuracy, where n is the number of samples in the data set and s is the sparsity of the data set, i.e., the average number of nonzero features of the data samples. We also derive the multiclass version of our CPMMC algorithm. Experimental evaluations on several real-world data sets show that CPMMC performs better than existing MMC methods, both in efficiency and accuracy.
最大间隔聚类(MMC)是一种新提出的聚类方法,在最近的研究中已显示出良好的性能。它将支持向量机(SVM)的计算技术扩展到无监督场景。传统上,MMC被表述为一个非凸整数规划问题,这使得它难以求解。文献中已经提出了几种基于半定规划(SDP)或交替优化来解决MMC问题的方法。然而,这些方法在处理大规模数据集时仍然耗时,这限制了其在实际问题中的应用。在本文中,我们提出了一种切割平面最大间隔聚类(CPMMC)算法。它首先利用约束凹凸过程(CCCP)将非凸MMC问题分解为一系列凸子问题,然后对于每个子问题,我们的算法采用切割平面算法来求解。此外,我们表明CPMMC算法在保证精度的情况下需要O(sn)时间收敛,其中n是数据集中样本的数量,s是数据集的稀疏度,即数据样本非零特征的平均数量。我们还推导了CPMMC算法的多类版本。对几个真实世界数据集的实验评估表明,CPMMC在效率和准确性方面都比现有的MMC方法表现更好。