使最大间隔聚类切实可行。

Maximum margin clustering made practical.

作者信息

Zhang Kai, Tsang Ivor W, Kwok James T

机构信息

Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.

出版信息

IEEE Trans Neural Netw. 2009 Apr;20(4):583-96. doi: 10.1109/TNN.2008.2010620. Epub 2009 Mar 6.

DOI:10.1109/TNN.2008.2010620

PMID:19273048

Abstract

Motivated by the success of large margin methods in supervised learning, maximum margin clustering (MMC) is a recent approach that aims at extending large margin methods to unsupervised learning. However, its optimization problem is nonconvex and existing MMC methods all rely on reformulating and relaxing the nonconvex optimization problem as semidefinite programs (SDP). Though SDP is convex and standard solvers are available, they are computationally very expensive and only small data sets can be handled. To make MMC more practical, we avoid SDP relaxations and propose in this paper an efficient approach that performs alternating optimization directly on the original nonconvex problem. A key step to avoid premature convergence in the resultant iterative procedure is to change the loss function from the hinge loss to the Laplacian/square loss so that overconfident predictions are penalized. Experiments on a number of synthetic and real-world data sets demonstrate that the proposed approach is more accurate, much faster (hundreds to tens of thousands of times faster), and can handle data sets that are hundreds of times larger than the largest data set reported in the MMC literature.

摘要

受监督学习中最大间隔方法成功的启发，最大间隔聚类（MMC）是一种旨在将最大间隔方法扩展到无监督学习的最新方法。然而，其优化问题是非凸的，现有的MMC方法都依赖于将非凸优化问题重新表述并松弛为半定规划（SDP）。虽然SDP是凸的且有标准求解器可用，但它们计算成本非常高，只能处理小数据集。为了使MMC更具实用性，我们避免使用SDP松弛方法，并在本文中提出一种直接对原始非凸问题进行交替优化的有效方法。在所得迭代过程中避免过早收敛的关键步骤是将损失函数从铰链损失改为拉普拉斯/平方损失，以便对过度自信的预测进行惩罚。在多个合成和真实世界数据集上的实验表明，所提出的方法更准确、速度更快（快数百到数万倍）并且能够处理比MMC文献中报道的最大数据集大数百倍的数据集。

相似文献

Maximum margin clustering made practical.

IEEE Trans Neural Netw. 2009 Apr;20(4):583-96. doi: 10.1109/TNN.2008.2010620. Epub 2009 Mar 6.

Linear time maximum margin clustering.

IEEE Trans Neural Netw. 2010 Feb;21(2):319-32. doi: 10.1109/TNN.2009.2036998. Epub 2010 Jan 15.

Linearithmic time sparse and convex maximum margin clustering.

IEEE Trans Syst Man Cybern B Cybern. 2012 Dec;42(6):1669-92. doi: 10.1109/TSMCB.2012.2197824. Epub 2012 May 23.

Efficient semidefinite spectral clustering via lagrange duality.

IEEE Trans Image Process. 2014 Aug;23(8):3522-34. doi: 10.1109/TIP.2014.2329453. Epub 2014 Jun 6.

Certifiably Optimal Outlier-Robust Geometric Perception: Semidefinite Relaxations and Scalable Global Optimization.

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):2816-2834. doi: 10.1109/TPAMI.2022.3179463. Epub 2023 Feb 3.

Worst case linear discriminant analysis as scalable semidefinite feasibility problems.

IEEE Trans Image Process. 2015 Aug;24(8):2382-92. doi: 10.1109/TIP.2015.2401511. Epub 2015 Feb 6.

Efficient Low-Rank Semidefinite Programming With Robust Loss Functions.

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6153-6168. doi: 10.1109/TPAMI.2021.3085858. Epub 2022 Sep 14.

Scalable Nonparametric Low-Rank Kernel Learning Using Block Coordinate Descent.

IEEE Trans Neural Netw Learn Syst. 2015 Sep;26(9):1927-38. doi: 10.1109/TNNLS.2014.2361159. Epub 2014 Oct 17.

Low-Rank Matrix Learning Using Biconvex Surrogate Minimization.

IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3517-3527. doi: 10.1109/TNNLS.2019.2927819. Epub 2019 Aug 9.

MOCCA: Mirrored Convex/Concave Optimization for Nonconvex Composite Functions.

J Mach Learn Res. 2016;17(144):1-51.

引用本文的文献

Minimum Distribution Support Vector Clustering.

Entropy (Basel). 2021 Nov 8;23(11):1473. doi: 10.3390/e23111473.

A novel approach for data integration and disease subtyping.

Genome Res. 2017 Dec;27(12):2025-2039. doi: 10.1101/gr.215129.116. Epub 2017 Oct 24.

A novel automatic detection system for ECG arrhythmias using maximum margin clustering with immune evolutionary algorithm.

Comput Math Methods Med. 2013;2013:453402. doi: 10.1155/2013/453402. Epub 2013 Apr 18.

A hybrid model of maximum margin clustering method and support vector regression for noninvasive electrocardiographic imaging.

Comput Math Methods Med. 2012;2012:436281. doi: 10.1155/2012/436281. Epub 2012 Nov 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使最大间隔聚类切实可行。

Maximum margin clustering made practical.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献