IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):5549-5560. doi: 10.1109/TPAMI.2022.3203630. Epub 2023 Apr 3.
Representation learning has significantly been developed with the advance of contrastive learning methods. Most of those methods are benefited from various data augmentations that are carefully designated to maintain their identities so that the images transformed from the same instance can still be retrieved. However, those carefully designed transformations limited us to further explore the novel patterns exposed by other transformations. Meanwhile, as shown in our experiments, direct contrastive learning for stronger augmented images can not learn representations effectively. Thus, we propose a general framework called Contrastive Learning with Stronger Augmentations (CLSA) to complement current contrastive learning approaches. Here, the distribution divergence between the weakly and strongly augmented images over the representation bank is adopted to supervise the retrieval of strongly augmented queries from a pool of instances. Experiments on the ImageNet dataset and downstream datasets showed the information from the strongly augmented images can significantly boost the performance. For example, CLSA achieves top-1 accuracy of 76.2% on ImageNet with a standard ResNet-50 architecture with a single-layer classifier fine-tuned, which is almost the same level as 76.5% of supervised results.
随着对比学习方法的进步,表征学习得到了显著发展。这些方法中的大多数都受益于各种数据增强技术,这些技术被精心设计用于保持其身份,以便可以从同一实例转换的图像仍然可以检索到。然而,这些精心设计的转换限制了我们进一步探索其他转换所揭示的新模式。同时,正如我们的实验所示,直接对更强的增强图像进行对比学习不能有效地学习表示。因此,我们提出了一种称为增强对比学习(CLSA)的通用框架,以补充当前的对比学习方法。在这里,弱增强图像和强增强图像在表示库上的分布差异被用来监督从实例池中检索强增强查询。在 ImageNet 数据集和下游数据集上的实验表明,来自强增强图像的信息可以显著提高性能。例如,CLSA 在 ImageNet 上实现了 76.2%的 top-1 准确率,使用标准的 ResNet-50 架构和单个分类器微调,几乎与监督结果的 76.5%相同。