Yang Jianfei, Yang Jiangang, Wang Shizheng, Cao Shuxin, Zou Han, Xie Lihua
IEEE Trans Cybern. 2023 Feb;53(2):1106-1117. doi: 10.1109/TCYB.2021.3093888. Epub 2023 Jan 13.
Unsupervised domain adaptation methods have been proposed to tackle the problem of covariate shift by minimizing the distribution discrepancy between the feature embeddings of source domain and target domain. However, the standard evaluation protocols assume that the conditional label distributions of the two domains are invariant, which is usually not consistent with the real-world scenarios such as long-tailed distribution of visual categories. In this article, the imbalanced domain adaptation (IDA) is formulated for a more realistic scenario where both label shift and covariate shift occur between the two domains. Theoretically, when label shift exists, aligning the marginal distributions may result in negative transfer. Therefore, a novel cluster-level discrepancy minimization (CDM) is developed. CDM proposes cross-domain similarity learning to learn tight and discriminative clusters, which are utilized for both feature-level and distribution-level discrepancy minimization, palliating the negative effect of label shift during domain transfer. Theoretical justifications further demonstrate that CDM minimizes the target risk in a progressive manner. To corroborate the effectiveness of CDM, we propose two evaluation protocols according to the real-world situation and benchmark existing domain adaptation approaches. Extensive experiments demonstrate that negative transfer does occur due to label shift, while our approach achieves significant improvement on imbalanced datasets, including Office-31, Image-CLEF, and Office-Home.
无监督域适应方法已被提出,通过最小化源域和目标域特征嵌入之间的分布差异来解决协变量转移问题。然而,标准评估协议假设两个域的条件标签分布是不变的,这通常与视觉类别长尾分布等现实世界场景不一致。在本文中,针对两个域之间同时发生标签转移和协变量转移的更现实场景,制定了不平衡域适应(IDA)。从理论上讲,当存在标签转移时,对齐边际分布可能会导致负迁移。因此,开发了一种新颖的聚类级差异最小化(CDM)方法。CDM提出跨域相似性学习来学习紧密且有区分性的聚类,这些聚类用于特征级和分布级差异最小化,减轻域转移期间标签转移的负面影响。理论依据进一步证明,CDM以渐进方式最小化目标风险。为了证实CDM的有效性,我们根据现实世界情况提出了两种评估协议,并对现有的域适应方法进行基准测试。大量实验表明,由于标签转移确实会发生负迁移,而我们的方法在不平衡数据集上取得了显著改进,包括Office-31、Image-CLEF和Office-Home。