Ge Yixiao, Zhu Feng, Chen Dapeng, Zhao Rui, Wang Xiaogang, Li Hongsheng
IEEE Trans Neural Netw Learn Syst. 2022 May 18;PP. doi: 10.1109/TNNLS.2022.3173489.
Unsupervised domain adaptation (UDA) aims at adapting the model trained on a labeled source-domain dataset to an unlabeled target-domain dataset. The task of UDA on open-set person reidentification (re-ID) is even more challenging as the identities (classes) do not have overlap between the two domains. One major research direction was based on domain translation, which, however, has fallen out of favor in recent years due to inferior performance compared with pseudo-label-based methods. We argue that domain translation has great potential on exploiting valuable source-domain data but the existing methods did not provide proper regularization on the translation process. Specifically, previous methods only focus on maintaining the identities of the translated images while ignoring the intersample relations during translation. To tackle the challenges, we propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term. During training, the person feature encoder is optimized to model intersample relations on-the-fly for supervising relation-consistency domain translation, which in turn improves the encoder with informative translated images. The encoder can be further improved with pseudo labels, where the source-to-target translated images with ground-truth identities and target-domain images with pseudo identities are jointly used for training. In the experiments, our proposed framework is shown to achieve state-of-the-art performance on multiple UDA tasks of person re-ID. With the synthetic→real translated images from our structured domain-translation network, we achieved second place in the Visual Domain Adaptation Challenge (VisDA) in 2020.
无监督域适应(UDA)旨在将在有标签的源域数据集上训练的模型应用于无标签的目标域数据集。在开放集行人重识别(re-ID)上的UDA任务更具挑战性,因为两个域之间的身份(类别)没有重叠。一个主要的研究方向是基于域翻译,然而,由于与基于伪标签的方法相比性能较差,近年来它已不受青睐。我们认为域翻译在利用有价值的源域数据方面具有巨大潜力,但现有方法在翻译过程中没有提供适当的正则化。具体来说,以前的方法只关注保持翻译后图像的身份,而忽略了翻译过程中的样本间关系。为了应对这些挑战,我们提出了一个带有在线关系一致性正则化项的端到端结构化域适应框架。在训练过程中,对行人特征编码器进行优化,以实时建模样本间关系,用于监督关系一致的域翻译,这反过来又利用信息丰富的翻译后图像改进编码器。编码器可以用伪标签进一步改进,其中带有真实身份的源到目标翻译图像和带有伪身份的目标域图像联合用于训练。在实验中,我们提出的框架在行人重识别的多个UDA任务上表现出了领先的性能。利用我们的结构化域翻译网络生成的合成→真实翻译图像,我们在2020年视觉域适应挑战赛(VisDA)中获得了第二名。