Dornaika Fadi, Sun Danyang
IEEE Trans Image Process. 2024;33:205-215. doi: 10.1109/TIP.2023.3336532. Epub 2023 Dec 13.
Cutmix-based data augmentation, which uses a cut-and-paste strategy, has shown remarkable generalization capabilities in deep learning. However, existing methods primarily consider global semantics with image-level constraints, which excessively reduces attention to the discriminative local context of the class and leads to a performance improvement bottleneck. Moreover, existing methods for generating augmented samples usually involve cutting and pasting rectangular or square regions, resulting in a loss of object part information. To mitigate the problem of inconsistency between the augmented image and the generated mixed label, existing methods usually require double forward propagation or rely on an external pre-trained network for object centering, which is inefficient. To overcome the above limitations, we propose LGCOAMix, an efficient context-aware and object-part-aware superpixel-based grid blending method for data augmentation. To the best of our knowledge, this is the first time that a label mixing strategy using a superpixel attention approach has been proposed for cutmix-based data augmentation. It is the first instance of learning local features from discriminative superpixel-wise regions and cross-image superpixel contrasts. Extensive experiments on various benchmark datasets show that LGCOAMix outperforms state-of-the-art cutmix-based data augmentation methods on classification tasks, and weakly supervised object location on CUB200-2011. We have demonstrated the effectiveness of LGCOAMix not only for CNN networks, but also for Transformer networks. Source codes are available at https://github.com/DanielaPlusPlus/LGCOAMix.
基于Cutmix的数据增强方法采用了剪切粘贴策略,在深度学习中展现出了卓越的泛化能力。然而,现有方法主要考虑具有图像级约束的全局语义,过度减少了对类别的判别性局部上下文的关注,导致性能提升瓶颈。此外,现有的生成增强样本的方法通常涉及剪切和粘贴矩形或正方形区域,导致对象部分信息丢失。为了缓解增强图像与生成的混合标签之间的不一致问题,现有方法通常需要进行两次前向传播,或者依赖外部预训练网络进行对象居中,这效率低下。为了克服上述限制,我们提出了LGCOAMix,一种用于数据增强的高效的基于上下文感知和对象部分感知的超像素网格混合方法。据我们所知,这是首次针对基于Cutmix的数据增强提出使用超像素注意力方法的标签混合策略。这是首次从有判别力的超像素区域和跨图像超像素对比中学习局部特征。在各种基准数据集上进行的大量实验表明,LGCOAMix在分类任务以及CUB200 - 2011上的弱监督对象定位方面优于基于Cutmix的现有数据增强方法。我们已经证明了LGCOAMix不仅对CNN网络有效,对Transformer网络也有效。源代码可在https://github.com/DanielaPlusPlus/LGCOAMix获取。