用于视觉领域泛化的潜在特征解缠

Latent Feature Disentanglement for Visual Domain Generalization.

作者信息

Gholami Behnam, El-Khamy Mostafa, Song Kee-Bong

出版信息

IEEE Trans Image Process. 2023;32:5751-5763. doi: 10.1109/TIP.2023.3321511. Epub 2023 Oct 24.

DOI:10.1109/TIP.2023.3321511

Abstract

Despite remarkable success in a variety of computer vision applications, it is well-known that deep learning can fail catastrophically when presented with out-of-distribution data, where there are usually style differences between the training and test images. Toward addressing this challenge, we consider the domain generalization problem, wherein predictors are trained using data drawn from a family of related training (source) domains and then evaluated on a distinct and unseen test domain. Naively training a model on the aggregate set of data (pooled from all source domains) has been shown to perform suboptimally, since the information learned by that model might be domain-specific and generalizes imperfectly to test domains. Data augmentation has been shown to be an effective approach to overcome this problem. However, its application has been limited to enforcing invariance to simple transformations like rotation, brightness change, etc. Such perturbations do not necessarily cover plausible real-world variations that preserve the semantics of the input (such as a change in the image style). In this paper, taking the advantage of multiple source domains, we propose a novel approach to express and formalize robustness to these kind of real-world image perturbations. The three key ideas underlying our formulation are (1) leveraging disentangled representations of the images to define different factors of variations, (2) generating perturbed images by changing such factors composing the representations of the images, (3) enforcing the learner (classifier) to be invariant to such changes in the images. We use image-to-image translation models to demonstrate the efficacy of this approach. Based on this, we propose a domain-invariant regularization (DIR) loss function that enforces invariant prediction of targets (class labels) across domains which yields improved generalization performance. We demonstrate the effectiveness of our approach on several widely used datasets for the domain generalization problem, on all of which our results are competitive with the state-of-the-art.

摘要

尽管深度学习在各种计算机视觉应用中取得了显著成功，但众所周知，当面对分布外数据时，深度学习可能会遭遇灾难性失败，训练图像和测试图像之间通常存在风格差异。为应对这一挑战，我们考虑领域泛化问题，即使用从一系列相关训练（源）领域中抽取的数据对预测器进行训练，然后在一个不同的、未见过的测试领域上进行评估。事实证明，在聚合数据集（从所有源领域汇集而来）上简单地训练模型，其性能并不理想，因为该模型学到的信息可能是特定于领域的，无法很好地泛化到测试领域。数据增强已被证明是克服这一问题的有效方法。然而，其应用仅限于对旋转、亮度变化等简单变换强制实现不变性。此类扰动并不一定涵盖能保留输入语义的合理现实世界变化（例如图像风格的变化）。在本文中，我们利用多个源领域的优势，提出了一种新颖的方法来表达和形式化对这类现实世界图像扰动的鲁棒性。我们公式化的三个关键思想是：（1）利用图像的解缠表示来定义不同的变化因素；（2）通过改变构成图像表示的此类因素来生成扰动图像；（3）强制学习者（分类器）对图像中的此类变化保持不变。我们使用图像到图像的翻译模型来证明这种方法的有效性。基于此，我们提出了一种领域不变正则化（DIR）损失函数，该函数强制在不同领域对目标（类别标签）进行不变预测，从而提高泛化性能。我们在几个广泛使用的领域泛化问题数据集上证明了我们方法的有效性，在所有这些数据集上我们的结果都与当前最优方法具有竞争力。

相似文献

Latent Feature Disentanglement for Visual Domain Generalization.用于视觉领域泛化的潜在特征解缠

IEEE Trans Image Process. 2023;32:5751-5763. doi: 10.1109/TIP.2023.3321511. Epub 2023 Oct 24.

Continuous Disentangled Joint Space Learning for Domain Generalization.用于领域泛化的连续解缠关节空间学习

IEEE Trans Neural Netw Learn Syst. 2024 Sep 20;PP. doi: 10.1109/TNNLS.2024.3454689.

CDDSA: Contrastive domain disentanglement and style augmentation for generalizable medical image segmentation.CDDSA：用于可泛化医学图像分割的对比域解缠和风格增强。

Med Image Anal. 2023 Oct;89:102904. doi: 10.1016/j.media.2023.102904. Epub 2023 Jul 18.

Learning Domain-Invariant Representations of Histological Images.学习组织学图像的领域不变表示。

Front Med (Lausanne). 2019 Jul 16;6:162. doi: 10.3389/fmed.2019.00162. eCollection 2019.

INSURE: An Information Theory iNspired diSentanglement and pURification modEl for Domain Generalization.INSURE：一种受信息论启发的用于领域泛化的解纠缠与提纯模型。

IEEE Trans Image Process. 2024;33:3508-3519. doi: 10.1109/TIP.2024.3404241. Epub 2024 Jun 4.

Style Uncertainty Based Self-Paced Meta Learning for Generalizable Person Re-Identification.基于风格不确定性的自定步元学习在通用人像再识别中的应用

IEEE Trans Image Process. 2023;32:2107-2119. doi: 10.1109/TIP.2023.3263112.

Improving domain generalization performance for medical image segmentation via random feature augmentation.通过随机特征增强提高医学图像分割的领域泛化性能。

Methods. 2023 Oct;218:149-157. doi: 10.1016/j.ymeth.2023.08.003. Epub 2023 Aug 10.

Disentangled representation and cross-modality image translation based unsupervised domain adaptation method for abdominal organ segmentation.基于解缠表示和跨模态图像翻译的无监督域自适应腹部器官分割方法。

Int J Comput Assist Radiol Surg. 2022 Jun;17(6):1101-1113. doi: 10.1007/s11548-022-02590-7. Epub 2022 Mar 17.

D-BIN: A Generalized Disentangling Batch Instance Normalization for Domain Adaptation.D-BIN：用于域适应的广义解缠批实例归一化

IEEE Trans Cybern. 2023 Apr;53(4):2151-2163. doi: 10.1109/TCYB.2021.3110128. Epub 2023 Mar 16.

Learning Domain Invariant Prompt for Vision-Language Models.用于视觉语言模型的学习领域不变提示

IEEE Trans Image Process. 2024;33:1348-1360. doi: 10.1109/TIP.2024.3362062. Epub 2024 Feb 14.

用于视觉领域泛化的潜在特征解缠

Latent Feature Disentanglement for Visual Domain Generalization.

作者信息

Gholami Behnam, El-Khamy Mostafa, Song Kee-Bong

出版信息

IEEE Trans Image Process. 2023;32:5751-5763. doi: 10.1109/TIP.2023.3321511. Epub 2023 Oct 24.

DOI:10.1109/TIP.2023.3321511

PMID:37831569

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于视觉领域泛化的潜在特征解缠

Latent Feature Disentanglement for Visual Domain Generalization.

作者信息

出版信息

相似文献

用于视觉领域泛化的潜在特征解缠

Latent Feature Disentanglement for Visual Domain Generalization.

作者信息

出版信息

相似文献