IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9887-9903. doi: 10.1109/TPAMI.2021.3131222. Epub 2022 Nov 7.
Facial expression recognition (FER) has received significant attention in the past decade with witnessed progress, but data inconsistencies among different FER datasets greatly hinder the generalization ability of the models learned on one dataset to another. Recently, a series of cross-domain FER algorithms (CD-FERs) have been extensively developed to address this issue. Although each declares to achieve superior performance, comprehensive and fair comparisons are lacking due to inconsistent choices of the source/target datasets and feature extractors. In this work, we first propose to construct a unified CD-FER evaluation benchmark, in which we re-implement the well-performing CD-FER and recently published general domain adaptation algorithms and ensure that all these algorithms adopt the same source/target datasets and feature extractors for fair CD-FER evaluations. Based on the analysis, we find that most of the current state-of-the-art algorithms use adversarial learning mechanisms that aim to learn holistic domain-invariant features to mitigate domain shifts. However, these algorithms ignore local features, which are more transferable across different datasets and carry more detailed content for fine-grained adaptation. Therefore, we develop a novel adversarial graph representation adaptation (AGRA) framework that integrates graph representation propagation with adversarial learning to realize effective cross-domain holistic-local feature co-adaptation. Specifically, our framework first builds two graphs to correlate holistic and local regions within each domain and across different domains, respectively. Then, it extracts holistic-local features from the input image and uses learnable per-class statistical distributions to initialize the corresponding graph nodes. Finally, two stacked graph convolution networks (GCNs) are adopted to propagate holistic-local features within each domain to explore their interaction and across different domains for holistic-local feature co-adaptation. In this way, the AGRA framework can adaptively learn fine-grained domain-invariant features and thus facilitate cross-domain expression recognition. We conduct extensive and fair comparisons on the unified evaluation benchmark and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
面部表情识别(FER)在过去十年中受到了广泛关注,并且取得了显著的进展,但不同 FER 数据集之间的数据不一致极大地阻碍了在一个数据集上学习的模型在另一个数据集上的泛化能力。最近,一系列跨域 FER 算法(CD-FERs)已经被广泛开发来解决这个问题。尽管每个算法都声称实现了卓越的性能,但由于源/目标数据集和特征提取器的不一致选择,缺乏全面和公平的比较。在这项工作中,我们首先提出构建一个统一的 CD-FER 评估基准,其中我们重新实现了表现良好的 CD-FER 和最近发表的通用域自适应算法,并确保所有这些算法都采用相同的源/目标数据集和特征提取器,以进行公平的 CD-FER 评估。基于分析,我们发现,当前大多数最先进的算法都使用对抗学习机制,旨在学习整体的域不变特征,以减轻域转移。然而,这些算法忽略了局部特征,这些特征在不同的数据集之间更具可转移性,并且携带更详细的内容,以便进行细粒度的自适应。因此,我们开发了一种新颖的对抗图表示自适应(AGRA)框架,该框架将图表示传播与对抗学习相结合,以实现有效的跨域整体-局部特征协同自适应。具体来说,我们的框架首先构建两个图,分别关联每个域内和不同域之间的整体和局部区域。然后,它从输入图像中提取整体-局部特征,并使用可学习的每类统计分布来初始化相应的图节点。最后,采用两个堆叠的图卷积网络(GCNs)在每个域内传播整体-局部特征,以探索它们之间的相互作用,并在不同域之间进行整体-局部特征协同自适应。通过这种方式,AGRA 框架可以自适应地学习细粒度的域不变特征,从而促进跨域表情识别。我们在统一的评估基准上进行了广泛而公平的比较,并表明所提出的 AGRA 框架优于以前的最先进方法。