Wang Yulin, Huang Gao, Song Shiji, Pan Xuran, Xia Yitong, Wu Cheng
IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3733-3748. doi: 10.1109/TPAMI.2021.3052951. Epub 2022 Jun 3.
Data augmentation is widely known as a simple yet surprisingly effective technique for regularizing deep networks. Conventional data augmentation schemes, e.g., flipping, translation or rotation, are low-level, data-independent and class-agnostic operations, leading to limited diversity for augmented samples. To this end, we propose a novel semantic data augmentation algorithm to complement traditional approaches. The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features, i.e., certain directions in the deep feature space correspond to meaningful semantic transformations, e.g., changing the background or view angle of an object. Based on this observation, translating training samples along many such directions in the feature space can effectively augment the dataset for more diversity. To implement this idea, we first introduce a sampling based method to obtain semantically meaningful directions efficiently. Then, an upper bound of the expected cross-entropy (CE) loss on the augmented training set is derived by assuming the number of augmented samples goes to infinity, yielding a highly efficient algorithm. In fact, we show that the proposed implicit semantic data augmentation (ISDA) algorithm amounts to minimizing a novel robust CE loss, which adds minimal extra computational cost to a normal training procedure. In addition to supervised learning, ISDA can be applied to semi-supervised learning tasks under the consistency regularization framework, where ISDA amounts to minimizing the upper bound of the expected KL-divergence between the augmented features and the original features. Although being simple, ISDA consistently improves the generalization performance of popular deep models (e.g., ResNets and DenseNets) on a variety of datasets, i.e., CIFAR-10, CIFAR-100, SVHN, ImageNet, and Cityscapes. Code for reproducing our results is available at https://github.com/blackfeather-wang/ISDA-for-Deep-Networks.
数据增强作为一种用于正则化深度网络的简单却惊人有效的技术广为人知。传统的数据增强方案,例如翻转、平移或旋转,都是低级的、与数据无关且不区分类别的操作,导致增强样本的多样性有限。为此,我们提出了一种新颖的语义数据增强算法来补充传统方法。所提出的方法受到这样一个有趣特性的启发:深度网络在学习线性化特征方面很有效,即深度特征空间中的某些方向对应于有意义的语义变换,例如改变物体的背景或视角。基于这一观察,在特征空间中沿着许多这样的方向平移训练样本可以有效地扩充数据集以获得更多样性。为了实现这个想法,我们首先引入一种基于采样的方法来高效地获得语义上有意义的方向。然后,通过假设增强样本的数量趋于无穷大,推导出增强训练集上期望交叉熵(CE)损失的上界,从而得到一种高效的算法。事实上,我们表明所提出的隐式语义数据增强(ISDA)算法相当于最小化一种新颖的鲁棒CE损失,这在正常训练过程中只增加了极少的额外计算成本。除了监督学习,ISDA可以应用于一致性正则化框架下的半监督学习任务,在这种情况下,ISDA相当于最小化增强特征与原始特征之间期望KL散度的上界。尽管ISDA很简单,但它在各种数据集(即CIFAR - 10、CIFAR - 100、SVHN、ImageNet和Cityscapes)上持续提高了流行深度模型(例如ResNets和DenseNets)的泛化性能。用于重现我们结果的代码可在https://github.com/blackfeather - wang/ISDA - for - Deep - Networks获取。