College of Mathematics and Informatics, South China Agricultural University, China.
College of Mathematics and Informatics, South China Agricultural University, China; Key Laboratory of Smart Agricultural Technology in Tropical South China, Ministryof Agriculture and Rural Affairs, China.
Neural Netw. 2024 Dec;180:106696. doi: 10.1016/j.neunet.2024.106696. Epub 2024 Sep 3.
Despite significant advances in the deep clustering research, there remain three critical limitations to most of the existing approaches. First, they often derive the clustering result by associating some distribution-based loss to specific network layers, neglecting the potential benefits of leveraging the contrastive sample-wise relationships. Second, they frequently focus on representation learning at the full-image scale, overlooking the discriminative information latent in partial image regions. Third, although some prior studies perform the learning process at multiple levels, they mostly lack the ability to exploit the interaction between different learning levels. To overcome these limitations, this paper presents a novel deep image clustering approach via Partial Information discrimination and Cross-level Interaction (PICI). Specifically, we utilize a Transformer encoder as the backbone, coupled with two types of augmentations to formulate two parallel views. The augmented samples, integrated with masked patches, are processed through the Transformer encoder to produce the class tokens. Subsequently, three partial information learning modules are jointly enforced, namely, the partial information self-discrimination (PISD) module for masked image reconstruction, the partial information contrastive discrimination (PICD) module for the simultaneous instance- and cluster-level contrastive learning, and the cross-level interaction (CLI) module to ensure the consistency across different learning levels. Through this unified formulation, our PICI approach for the first time, to our knowledge, bridges the gap between the masked image modeling and the deep contrastive clustering, offering a novel pathway for enhanced representation learning and clustering. Experimental results across six image datasets demonstrate the superiority of our PICI approach over the state-of-the-art. In particular, our approach achieves an ACC of 0.772 (0.634) on the RSOD (UC-Merced) dataset, which shows an improvement of 29.7% (24.8%) over the best baseline. The source code is available at https://github.com/Regan-Zhang/PICI.
尽管在深度聚类研究方面取得了重大进展,但现有的大多数方法仍然存在三个关键的局限性。首先,它们通常通过将基于分布的损失与特定的网络层相关联来获得聚类结果,而忽略了利用对比样本关系的潜在好处。其次,它们通常侧重于全图像尺度的表示学习,忽略了局部图像区域中潜在的鉴别信息。第三,尽管一些先前的研究在多个层次上进行学习过程,但它们大多缺乏利用不同学习层次之间相互作用的能力。为了克服这些局限性,本文提出了一种通过部分信息判别和跨层交互(PICI)的新的深度图像聚类方法。具体来说,我们使用 Transformer 编码器作为骨干,并结合两种增强方式来构建两个平行的视图。增强后的样本与掩蔽补丁一起通过 Transformer 编码器进行处理,以生成类令牌。随后,联合实施了三个部分信息学习模块,即掩蔽图像重建的部分信息自判别(PISD)模块、同时进行实例级和聚类级对比学习的部分信息对比判别(PICD)模块以及跨层交互(CLI)模块,以确保不同学习层次之间的一致性。通过这种统一的表述,我们的 PICI 方法首次在知识范围内弥合了掩蔽图像建模和深度对比聚类之间的差距,为增强表示学习和聚类提供了一条新途径。在六个图像数据集上的实验结果表明,我们的 PICI 方法优于最先进的方法。特别是,我们的方法在 RSOD(UC-Merced)数据集上的 ACC 达到 0.772(0.634),比最佳基线提高了 29.7%(24.8%)。源代码可在 https://github.com/Regan-Zhang/PICI 上获得。