Suppr超能文献

使最大间隔聚类切实可行。

Maximum margin clustering made practical.

作者信息

Zhang Kai, Tsang Ivor W, Kwok James T

机构信息

Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.

出版信息

IEEE Trans Neural Netw. 2009 Apr;20(4):583-96. doi: 10.1109/TNN.2008.2010620. Epub 2009 Mar 6.

Abstract

Motivated by the success of large margin methods in supervised learning, maximum margin clustering (MMC) is a recent approach that aims at extending large margin methods to unsupervised learning. However, its optimization problem is nonconvex and existing MMC methods all rely on reformulating and relaxing the nonconvex optimization problem as semidefinite programs (SDP). Though SDP is convex and standard solvers are available, they are computationally very expensive and only small data sets can be handled. To make MMC more practical, we avoid SDP relaxations and propose in this paper an efficient approach that performs alternating optimization directly on the original nonconvex problem. A key step to avoid premature convergence in the resultant iterative procedure is to change the loss function from the hinge loss to the Laplacian/square loss so that overconfident predictions are penalized. Experiments on a number of synthetic and real-world data sets demonstrate that the proposed approach is more accurate, much faster (hundreds to tens of thousands of times faster), and can handle data sets that are hundreds of times larger than the largest data set reported in the MMC literature.

摘要

受监督学习中最大间隔方法成功的启发,最大间隔聚类(MMC)是一种旨在将最大间隔方法扩展到无监督学习的最新方法。然而,其优化问题是非凸的,现有的MMC方法都依赖于将非凸优化问题重新表述并松弛为半定规划(SDP)。虽然SDP是凸的且有标准求解器可用,但它们计算成本非常高,只能处理小数据集。为了使MMC更具实用性,我们避免使用SDP松弛方法,并在本文中提出一种直接对原始非凸问题进行交替优化的有效方法。在所得迭代过程中避免过早收敛的关键步骤是将损失函数从铰链损失改为拉普拉斯/平方损失,以便对过度自信的预测进行惩罚。在多个合成和真实世界数据集上的实验表明,所提出的方法更准确、速度更快(快数百到数万倍)并且能够处理比MMC文献中报道的最大数据集大数百倍的数据集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验