Dong Zhibin, Jin Jiaqi, Xiao Yuyang, Xiao Bin, Wang Siwei, Liu Xinwang, Zhu En
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):3218-3230. doi: 10.1109/TNNLS.2024.3350671. Epub 2025 Feb 6.
The success of multiview raw data mining relies on the integrity of attributes. However, each view faces various noises and collection failures, which leads to a condition that attributes are only partially available. To make matters worse, the attributes in multiview raw data are composed of multiple forms, which makes it more difficult to explore the structure of the data especially in multiview clustering task. Due to the missing data in some views, the clustering task on incomplete multiview data confronts the following challenges, namely: 1) mining the topology of missing data in multiview is an urgent problem to be solved; 2) most approaches do not calibrate the complemented representations with common information of multiple views; and 3) we discover that the cluster distributions obtained from incomplete views have a cluster distribution unaligned problem (CDUP) in the latent space. To solve the above issues, we propose a deep clustering framework based on subgraph propagation and contrastive calibration (SPCC) for incomplete multiview raw data. First, the global structural graph is reconstructed by propagating the subgraphs generated by the complete data of each view. Then, the missing views are completed and calibrated under the guidance of the global structural graph and contrast learning between views. In the latent space, we assume that different views have a common cluster representation in the same dimension. However, in the unsupervised condition, the fact that the cluster distributions of different views do not correspond affects the information completion process to use information from other views. Finally, the complemented cluster distributions for different views are aligned by contrastive learning (CL), thus solving the CDUP in the latent space. Our method achieves advanced performance on six benchmarks, which validates the effectiveness and superiority of our SPCC.
多视图原始数据挖掘的成功依赖于属性的完整性。然而,每个视图都面临各种噪声和采集失败的情况,这导致属性仅部分可用。更糟糕的是,多视图原始数据中的属性由多种形式组成,这使得探索数据结构变得更加困难,尤其是在多视图聚类任务中。由于某些视图中存在缺失数据,不完整多视图数据上的聚类任务面临以下挑战,即:1)挖掘多视图中缺失数据的拓扑结构是一个亟待解决的问题;2)大多数方法没有用多个视图的公共信息来校准补充后的表示;3)我们发现从不完整视图获得的聚类分布在潜在空间中存在聚类分布未对齐问题(CDUP)。为了解决上述问题,我们针对不完整多视图原始数据提出了一种基于子图传播和对比校准(SPCC)的深度聚类框架。首先,通过传播由每个视图的完整数据生成的子图来重建全局结构图。然后,在全局结构图和视图间对比学习的指导下完成并校准缺失视图。在潜在空间中,我们假设不同视图在同一维度上具有共同的聚类表示。然而,在无监督条件下,不同视图的聚类分布不对应的事实会影响利用其他视图信息的信息完成过程。最后,通过对比学习(CL)对齐不同视图的补充聚类分布,从而解决潜在空间中的CDUP问题。我们的方法在六个基准测试中取得了先进的性能,验证了我们的SPCC的有效性和优越性。