IEEE Trans Neural Netw Learn Syst. 2016 Jul;27(7):1514-26. doi: 10.1109/TNNLS.2015.2448653. Epub 2015 Jul 27.
Nonnegative matrix factorization (NMF) and symmetric NMF (SymNMF) have been shown to be effective for clustering linearly separable data and nonlinearly separable data, respectively. Nevertheless, many practical applications demand constrained algorithms in which a small number of constraints in the form of must-link and cannot-link are available. In this paper, we propose an NMF-based constrained clustering framework in which the similarity between two points on a must-link is enforced to approximate 1 and the similarity between two points on a cannot-link is enforced to approximate 0. We then formulate the framework using NMF and SymNMF to deal with clustering of linearly separable data and nonlinearly separable data, respectively. Furthermore, we present multiplicative update rules to solve them and show the correctness and convergence. Experimental results on various text data sets, University of California, Irvine (UCI) data sets, and gene expression data sets demonstrate the superiority of our algorithms over existing constrained clustering algorithms.
非负矩阵分解 (NMF) 和对称非负矩阵分解 (SymNMF) 已被证明分别对线性可分数据和非线性可分数据聚类有效。然而,许多实际应用需要约束算法,其中以必须链接和不能链接的形式提供少量约束。在本文中,我们提出了一种基于 NMF 的约束聚类框架,其中强制两个必须链接上的点之间的相似度近似为 1,而两个不能链接上的点之间的相似度强制近似为 0。然后,我们使用 NMF 和 SymNMF 分别对线性可分数据和非线性可分数据进行聚类,分别对其进行形式化。此外,我们提出了一种乘法更新规则来解决这些问题,并证明了它们的正确性和收敛性。对各种文本数据集、加利福尼亚大学欧文分校 (UCI) 数据集和基因表达数据集的实验结果表明,我们的算法优于现有的约束聚类算法。