Pan Haolin, Guo Yong, Yu Mianjie, Chen Jian
IEEE Trans Image Process. 2024;33:4215-4230. doi: 10.1109/TIP.2024.3425148. Epub 2024 Jul 22.
Real-world data often follows a long-tailed distribution, where a few head classes occupy most of the data and a large number of tail classes only contain very limited samples. In practice, deep models often show poor generalization performance on tail classes due to the imbalanced distribution. To tackle this, data augmentation has become an effective way by synthesizing new samples for tail classes. Among them, one popular way is to use CutMix that explicitly mixups the images of tail classes and the others, while constructing the labels according to the ratio of areas cropped from two images. However, the area-based labels entirely ignore the inherent semantic information of the augmented samples, often leading to misleading training signals. To address this issue, we propose a Contrastive CutMix (ConCutMix) that constructs augmented samples with semantically consistent labels to boost the performance of long-tailed recognition. Specifically, we compute the similarities between samples in the semantic space learned by contrastive learning, and use them to rectify the area-based labels. Experiments show that our ConCutMix significantly improves the accuracy on tail classes as well as the overall performance. For example, based on ResNeXt-50, we improve the overall accuracy on ImageNet-LT by 3.0% thanks to the significant improvement of 3.3% on tail classes. We highlight that the improvement also generalizes well to other benchmarks and models. Our code and pretrained models are available at https://github.com/PanHaulin/ConCutMix.
现实世界的数据通常遵循长尾分布,即少数头部类别占据了大部分数据,而大量的尾部类别只包含非常有限的样本。在实际应用中,由于分布不均衡,深度模型在尾部类别上往往表现出较差的泛化性能。为了解决这个问题,数据增强已成为一种有效的方法,即通过为尾部类别合成新样本。其中,一种流行的方法是使用CutMix,它明确地将尾部类别的图像与其他图像进行混合,同时根据从两张图像中裁剪的区域比例来构建标签。然而,基于区域的标签完全忽略了增强样本的固有语义信息,常常导致误导性的训练信号。为了解决这个问题,我们提出了一种对比CutMix(ConCutMix)方法,该方法通过构建具有语义一致标签的增强样本,以提高长尾识别的性能。具体来说,我们计算通过对比学习在语义空间中样本之间的相似度,并使用它们来修正基于区域的标签。实验表明,我们的ConCutMix显著提高了尾部类别的准确率以及整体性能。例如,基于ResNeXt-50,由于尾部类别上有3.3%的显著提升,我们将ImageNet-LT上的整体准确率提高了3.0%。我们强调,这种改进在其他基准测试和模型上也具有良好的通用性。我们的代码和预训练模型可在https://github.com/PanHaulin/ConCutMix获取。