IEEE Trans Image Process. 2021;30:9359-9371. doi: 10.1109/TIP.2021.3124674. Epub 2021 Nov 16.
Domain adversarial training has become a prevailing and effective paradigm for unsupervised domain adaptation (UDA). To successfully align the multi-modal data structures across domains, the following works exploit discriminative information in the adversarial training process, e.g., using multiple class-wise discriminators and involving conditional information in the input or output of the domain discriminator. However, these methods either require non-trivial model designs or are inefficient for UDA tasks. In this work, we attempt to address this dilemma by devising simple and compact conditional domain adversarial training methods. We first revisit the simple concatenation conditioning strategy where features are concatenated with output predictions as the input of the discriminator. We find the concatenation strategy suffers from the weak conditioning strength. We further demonstrate that enlarging the norm of concatenated predictions can effectively energize the conditional domain alignment. Thus we improve concatenation conditioning by normalizing the output predictions to have the same norm of features, and term the derived method as Normalized OutpUt coNditioner (NOUN). However, conditioning on raw output predictions for domain alignment, NOUN suffers from inaccurate predictions of the target domain. To this end, we propose to condition the cross-domain feature alignment in the prototype space rather than in the output space. Combining the novel prototype-based conditioning with NOUN, we term the enhanced method as PROtotype-based Normalized OutpUt coNditioner (PRONOUN). Experiments on both object recognition and semantic segmentation show that NOUN can effectively align the multi-modal structures across domains and even outperform state-of-the-art domain adversarial training methods. Together with prototype-based conditioning, PRONOUN further improves the adaptation performance over NOUN on multiple object recognition benchmarks for UDA. Code is available at https://github.com/tim-learn/NOUN.
域对抗训练已成为无监督域自适应 (UDA) 的一种流行且有效的范例。为了成功地对齐跨域的多模态数据结构,以下工作利用对抗训练过程中的判别信息,例如,使用多个类别判别器,并在域判别器的输入或输出中涉及条件信息。然而,这些方法要么需要复杂的模型设计,要么对 UDA 任务效率低下。在这项工作中,我们试图通过设计简单紧凑的条件域对抗训练方法来解决这个困境。我们首先重新审视了简单的拼接条件策略,其中特征与输出预测拼接作为判别器的输入。我们发现拼接策略受到弱条件强度的影响。我们进一步证明,增大拼接预测的范数可以有效地激励条件域对齐。因此,我们通过将输出预测归一化为与特征相同的范数来改进拼接条件,将得到的方法命名为归一化输出条件器(NOUN)。然而,由于在目标域上的预测不准确,NOUN 直接对输出预测进行条件化用于域对齐。为此,我们提出在原型空间而不是输出空间中对跨域特征对齐进行条件化。将新的基于原型的条件化与 NOUN 相结合,我们将增强的方法命名为基于原型的归一化输出条件器(PRONOUN)。在对象识别和语义分割的实验中,NOUN 可以有效地对齐跨域的多模态结构,甚至优于最新的域对抗训练方法。与基于原型的条件化相结合,PRONOUN 进一步提高了在多个对象识别基准上的 UDA 适应性能。代码可在 https://github.com/tim-learn/NOUN 上获得。