Wu Jianlong, Li Zihan, Sun Wei, Yin Jianhua, Nie Liqiang, Lin Zhouchen
IEEE Trans Pattern Anal Mach Intell. 2025 Jul 15;PP. doi: 10.1109/TPAMI.2025.3588239.
Recently, deep clustering methods have achieved remarkable results compared to traditional clustering approaches. However, its performance remains constrained by the absence of annotations. A thought-provoking observation is that there is still a significant gap between deep clustering and semi-supervised classification methods. Even with only a few labeled samples, the accuracy of semi-supervised learning is much higher than that of clustering. Given that we can annotate a small number of samples in a certain unsupervised way, the clustering task can be naturally transformed into a semi-supervised setting, thereby achieving comparable performance. Based on this intuition, we propose ClusMatch, a unified positive and negative pseudo-label learning based semi-supervised learning framework, which is pluggable and can be applied to existing deep clustering methods. Specifically, we first leverage the pre-trained deep clustering network to compute predictions for all samples, and then design specialized selection strategies to pick out a few high-quality samples as labeled samples for supervised learning. For the unselected samples, the novel unified positive and negative pseudo-label learning is introduced to provide additional supervised signals for semi-supervised fine-tuning. We also propose an adaptive positive-negative threshold learning strategy to further enhance the confidence of generated pseudo-labels. Extensive experiments on six widely-used datasets and one large-scale dataset demonstrate the superiority of our proposed ClusMatch. For example, ClusMatch achieves a significant accuracy improvement of 5.4% over the state-of-the-art method ProPos on an average of these six datasets. Source code can be found at https://github.com/XY-ATOE/ClusMatch.
最近,与传统聚类方法相比,深度聚类方法取得了显著成果。然而,其性能仍受限于缺乏标注。一个引人深思的观察结果是,深度聚类与半监督分类方法之间仍存在显著差距。即使只有少量带标签的样本,半监督学习的准确率也远高于聚类。鉴于我们可以以某种无监督的方式标注少量样本,聚类任务可以自然地转化为半监督设置,从而实现可比的性能。基于这种直觉,我们提出了ClusMatch,这是一个基于正负伪标签学习的统一半监督学习框架,它是可插拔的,可应用于现有的深度聚类方法。具体来说,我们首先利用预训练的深度聚类网络为所有样本计算预测,然后设计专门的选择策略,挑选出一些高质量样本作为监督学习的带标签样本。对于未被选中的样本,引入了新颖的统一正负伪标签学习,为半监督微调提供额外的监督信号。我们还提出了一种自适应正负阈值学习策略,以进一步提高生成伪标签的置信度。在六个广泛使用的数据集和一个大规模数据集上进行的大量实验证明了我们提出的ClusMatch的优越性。例如,在这六个数据集的平均水平上,ClusMatch比最先进的方法ProPos的准确率显著提高了5.4%。源代码可在https://github.com/XY-ATOE/ClusMatch上找到。