IEEE Trans Image Process. 2023;32:3442-3454. doi: 10.1109/TIP.2023.3276708. Epub 2023 Jun 19.
Self-supervised learning enables networks to learn discriminative features from massive data itself. Most state-of-the-art methods maximize the similarity between two augmentations of one image based on contrastive learning. By utilizing the consistency of two augmentations, the burden of manual annotations can be freed. Contrastive learning exploits instance-level information to learn robust features. However, the learned information is probably confined to different views of the same instance. In this paper, we attempt to leverage the similarity between two distinct images to boost representation in self-supervised learning. In contrast to instance-level information, the similarity between two distinct images may provide more useful information. Besides, we analyze the relation between similarity loss and feature-level cross-entropy loss. These two losses are essential for most deep learning methods. However, the relation between these two losses is not clear. Similarity loss helps obtain instance-level representation, while feature-level cross-entropy loss helps mine the similarity between two distinct images. We provide theoretical analyses and experiments to show that a suitable combination of these two losses can get state-of-the-art results. Code is available at https://github.com/guijiejie/ICCL.
自监督学习使网络能够从大量数据本身中学习到有区别的特征。大多数最先进的方法都是基于对比学习来最大化一张图像的两种增强之间的相似性。通过利用两种增强之间的一致性,可以减轻人工注释的负担。对比学习利用实例级别的信息来学习鲁棒的特征。然而,所学到的信息可能仅限于同一实例的不同视图。在本文中,我们尝试利用两幅不同图像之间的相似性来促进自监督学习中的表示。与实例级别的信息相比,两幅不同图像之间的相似性可能提供更有用的信息。此外,我们分析了相似性损失和特征级交叉熵损失之间的关系。这两种损失对于大多数深度学习方法都是必不可少的。然而,这两种损失之间的关系并不清楚。相似性损失有助于获得实例级别的表示,而特征级交叉熵损失有助于挖掘两幅不同图像之间的相似性。我们提供了理论分析和实验,表明这两种损失的适当组合可以得到最先进的结果。代码可在 https://github.com/guijiejie/ICCL 上获得。