Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou, 325035, Zhejiang, China.
The College of Electrical and Information Engineering, Quzhou University, Quzhou, 324000, Zhejiang, China.
Neural Netw. 2024 Nov;179:106557. doi: 10.1016/j.neunet.2024.106557. Epub 2024 Jul 20.
Unsupervised semantic segmentation is important for understanding that each pixel belongs to known categories without annotation. Recent studies have demonstrated promising outcomes by employing a vision transformer backbone pre-trained on an image-level dataset in a self-supervised manner. However, those methods always depend on complex architectures or meticulously designed inputs. Naturally, we are attempting to explore the investment with a straightforward approach. To prevent over-complication, we introduce a simple Dense Embedding Contrast network (DECNet) for unsupervised semantic segmentation in this paper. Specifically, we propose a Nearest Neighbor Similarity strategy (NNS) to establish well-defined positive and negative pairs for dense contrastive learning. Meanwhile, we optimize a contrastive objective named Ortho-InfoNCE to alleviate the false negative problem inherent in contrastive learning for further enhancing dense representations. Finally, extensive experiments conducted on COCO-Stuff and Cityscapes datasets demonstrate that our approach outperforms state-of-the-art methods.
无监督语义分割对于理解每个像素属于已知类别而无需标注非常重要。最近的研究表明,通过采用在图像级别数据集上进行自我监督预训练的视觉转换器骨干,取得了有希望的结果。然而,这些方法总是依赖于复杂的架构或精心设计的输入。自然而然,我们试图用一种简单的方法来探索投入。为了防止过度复杂化,我们在本文中引入了一种简单的密集嵌入对比网络(DECNet)用于无监督语义分割。具体来说,我们提出了一种最近邻相似性策略(NNS)来建立明确的正例和负例,用于密集对比学习。同时,我们优化了一个名为 Ortho-InfoNCE 的对比目标,以减轻对比学习中固有的假阴性问题,从而进一步增强密集表示。最后,在 COCO-Stuff 和 Cityscapes 数据集上进行的广泛实验表明,我们的方法优于最先进的方法。