Li Hangyu, Wang Nannan, Yang Xi, Gao Xinbo
IEEE Trans Image Process. 2022;31:4637-4650. doi: 10.1109/TIP.2022.3186536. Epub 2022 Jul 12.
Existing facial expression recognition (FER) methods train encoders with different large-scale training data for specific FER applications. In this paper, we propose a new task in this field. This task aims to pre-train a general encoder to extract any facial expression representations without fine-tuning. To tackle this task, we extend the self-supervised contrastive learning to pre-train a general encoder for facial expression analysis. To be specific, given a batch of facial expressions, some positive and negative pairs are firstly constructed based on coarse-grained labels and a FER-specified data augmentation strategy. Secondly, we propose the coarse-contrastive (CRS-CONT) learning, where the features of positive pairs are pulled together, while pushed away from the features of negative pairs. Moreover, one key event is that the excessive constraint on the coarse-grained feature distribution will affect fine-grained FER applications. To address this, a weight vector is designed to control the optimization of the CRS-CONT learning. As a result, a well-trained general encoder with frozen weights could preferably adapt to different facial expressions and realize the linear evaluation on any target datasets. Extensive experiments on both in- the-wild and in- the-lab FER datasets show that our method provides superior or comparable performance against state-of-the-art FER methods, especially on unseen facial expressions and cross-dataset evaluation. We hope that this work will help to reduce the training burden and develop a new solution against the fully-supervised feature learning with fine-grained labels. Code and the general encoder will be publicly available at https://github.com/hangyu94/CRS-CONT.
现有的面部表情识别(FER)方法针对特定的FER应用,使用不同的大规模训练数据来训练编码器。在本文中,我们在该领域提出了一项新任务。此任务旨在预训练一个通用编码器,以提取任何面部表情表示而无需微调。为了解决这个任务,我们将自监督对比学习扩展到预训练一个用于面部表情分析的通用编码器。具体来说,给定一批面部表情,首先基于粗粒度标签和FER特定的数据增强策略构建一些正例和负例对。其次,我们提出了粗对比(CRS-CONT)学习,其中正例对的特征被拉近,而与负例对的特征推开。此外,一个关键问题是对粗粒度特征分布的过度约束会影响细粒度的FER应用。为了解决这个问题,设计了一个权重向量来控制CRS-CONT学习的优化。结果,一个权重冻结的训练良好的通用编码器能够更好地适应不同的面部表情,并在任何目标数据集上实现线性评估。在野生和实验室FER数据集上进行的大量实验表明,我们的方法相对于现有最先进的FER方法提供了优异或相当的性能,特别是在未见面部表情和跨数据集评估方面。我们希望这项工作将有助于减轻训练负担,并开发一种针对带有细粒度标签的全监督特征学习的新解决方案。代码和通用编码器将在https://github.com/hangyu94/CRS-CONT上公开提供。