Qin Yalan, Feng Guorui, Ren Yanli, Zhang Xinpeng
IEEE Trans Cybern. 2023 Feb;53(2):832-844. doi: 10.1109/TCYB.2022.3165550. Epub 2023 Jan 13.
Multiview clustering has received great attention and numerous subspace clustering algorithms for multiview data have been presented. However, most of these algorithms do not effectively handle high-dimensional data and fail to exploit consistency for the number of the connected components in similarity matrices for different views. In this article, we propose a novel consistency-induced multiview subspace clustering (CiMSC) to tackle these issues, which is mainly composed of structural consistency (SC) and sample assignment consistency (SAC). To be specific, SC aims to learn a similarity matrix for each single view wherein the number of connected components equals to the cluster number of the dataset. SAC aims to minimize the discrepancy for the number of connected components in similarity matrices from different views based on the SAC assumption, that is, different views should produce the same number of connected components in similarity matrices. CiMSC also formulates cluster indicator matrices for different views, and shared similarity matrices simultaneously in an optimization framework. Since each column of similarity matrix can be used as a new representation of the data point, CiMSC can learn an effective subspace representation for the high-dimensional data, which is encoded into the latent representation by reconstruction in a nonlinear manner. We employ an alternating optimization scheme to solve the optimization problem. Experiments validate the advantage of CiMSC over 12 state-of-the-art multiview clustering approaches, for example, the accuracy of CiMSC is 98.06% on the BBCSport dataset.
多视图聚类受到了广泛关注,并且已经提出了许多用于多视图数据的子空间聚类算法。然而,这些算法中的大多数都不能有效地处理高维数据,并且无法利用不同视图相似性矩阵中连通分量数量的一致性。在本文中,我们提出了一种新颖的一致性诱导多视图子空间聚类(CiMSC)来解决这些问题,它主要由结构一致性(SC)和样本分配一致性(SAC)组成。具体而言,SC旨在为每个单视图学习一个相似性矩阵,其中连通分量的数量等于数据集的聚类数。SAC旨在基于SAC假设最小化不同视图相似性矩阵中连通分量数量的差异,即不同视图在相似性矩阵中应产生相同数量的连通分量。CiMSC还在一个优化框架中同时为不同视图制定聚类指示矩阵和共享相似性矩阵。由于相似性矩阵的每一列都可以用作数据点的新表示,CiMSC可以为高维数据学习一种有效的子空间表示,该表示通过非线性重建被编码到潜在表示中。我们采用交替优化方案来解决优化问题。实验验证了CiMSC相对于12种先进的多视图聚类方法的优势,例如,在BBCSport数据集上CiMSC的准确率为98.06%。