Gholami Behnam, El-Khamy Mostafa, Song Kee-Bong
IEEE Trans Image Process. 2023;32:5751-5763. doi: 10.1109/TIP.2023.3321511. Epub 2023 Oct 24.
Despite remarkable success in a variety of computer vision applications, it is well-known that deep learning can fail catastrophically when presented with out-of-distribution data, where there are usually style differences between the training and test images. Toward addressing this challenge, we consider the domain generalization problem, wherein predictors are trained using data drawn from a family of related training (source) domains and then evaluated on a distinct and unseen test domain. Naively training a model on the aggregate set of data (pooled from all source domains) has been shown to perform suboptimally, since the information learned by that model might be domain-specific and generalizes imperfectly to test domains. Data augmentation has been shown to be an effective approach to overcome this problem. However, its application has been limited to enforcing invariance to simple transformations like rotation, brightness change, etc. Such perturbations do not necessarily cover plausible real-world variations that preserve the semantics of the input (such as a change in the image style). In this paper, taking the advantage of multiple source domains, we propose a novel approach to express and formalize robustness to these kind of real-world image perturbations. The three key ideas underlying our formulation are (1) leveraging disentangled representations of the images to define different factors of variations, (2) generating perturbed images by changing such factors composing the representations of the images, (3) enforcing the learner (classifier) to be invariant to such changes in the images. We use image-to-image translation models to demonstrate the efficacy of this approach. Based on this, we propose a domain-invariant regularization (DIR) loss function that enforces invariant prediction of targets (class labels) across domains which yields improved generalization performance. We demonstrate the effectiveness of our approach on several widely used datasets for the domain generalization problem, on all of which our results are competitive with the state-of-the-art.
尽管深度学习在各种计算机视觉应用中取得了显著成功,但众所周知,当面对分布外数据时,深度学习可能会遭遇灾难性失败,训练图像和测试图像之间通常存在风格差异。为应对这一挑战,我们考虑领域泛化问题,即使用从一系列相关训练(源)领域中抽取的数据对预测器进行训练,然后在一个不同的、未见过的测试领域上进行评估。事实证明,在聚合数据集(从所有源领域汇集而来)上简单地训练模型,其性能并不理想,因为该模型学到的信息可能是特定于领域的,无法很好地泛化到测试领域。数据增强已被证明是克服这一问题的有效方法。然而,其应用仅限于对旋转、亮度变化等简单变换强制实现不变性。此类扰动并不一定涵盖能保留输入语义的合理现实世界变化(例如图像风格的变化)。在本文中,我们利用多个源领域的优势,提出了一种新颖的方法来表达和形式化对这类现实世界图像扰动的鲁棒性。我们公式化的三个关键思想是:(1)利用图像的解缠表示来定义不同的变化因素;(2)通过改变构成图像表示的此类因素来生成扰动图像;(3)强制学习者(分类器)对图像中的此类变化保持不变。我们使用图像到图像的翻译模型来证明这种方法的有效性。基于此,我们提出了一种领域不变正则化(DIR)损失函数,该函数强制在不同领域对目标(类别标签)进行不变预测,从而提高泛化性能。我们在几个广泛使用的领域泛化问题数据集上证明了我们方法的有效性,在所有这些数据集上我们的结果都与当前最优方法具有竞争力。