IEEE Trans Image Process. 2021;30:2016-2028. doi: 10.1109/TIP.2021.3049955. Epub 2021 Jan 21.
Facial expression recognition is of significant importance in criminal investigation and digital entertainment. Under unconstrained conditions, existing expression datasets are highly class-imbalanced, and the similarity between expressions is high. Previous methods tend to improve the performance of facial expression recognition through deeper or wider network structures, resulting in increased storage and computing costs. In this paper, we propose a new adaptive supervised objective named AdaReg loss, re-weighting category importance coefficients to address this class imbalance and increasing the discrimination power of expression representations. Inspired by human beings' cognitive mode, an innovative coarse-fine (C-F) labels strategy is designed to guide the model from easy to difficult to classify highly similar representations. On this basis, we propose a novel training framework named the emotional education mechanism (EEM) to transfer knowledge, composed of a knowledgeable teacher network (KTN) and a self-taught student network (STSN). Specifically, KTN integrates the outputs of coarse and fine streams, learning expression representations from easy to difficult. Under the supervision of the pre-trained KTN and existing learning experience, STSN can maximize the potential performance and compress the original KTN. Extensive experiments on public benchmarks demonstrate that the proposed method achieves superior performance compared to current state-of-the-art frameworks with 88.07% on RAF-DB, 63.97% on AffectNet and 90.49% on FERPlus.
面部表情识别在刑事侦查和数字娱乐中具有重要意义。在不受约束的条件下,现有的表情数据集高度不平衡,表情之间的相似度很高。以前的方法往往通过更深或更宽的网络结构来提高面部表情识别的性能,从而增加存储和计算成本。在本文中,我们提出了一种新的自适应监督目标,名为 AdaReg 损失,重新加权类别重要系数,以解决这种类别不平衡问题,并提高表情表示的辨别力。受人类认知模式的启发,我们设计了一种创新的粗-精(C-F)标签策略,引导模型从易于分类的样本开始,逐步处理具有高度相似性的困难样本。在此基础上,我们提出了一种新的训练框架,称为情感教育机制(EEM),以传递知识,由一个有知识的教师网络(KTN)和一个自学的学生网络(STSN)组成。具体来说,KTN 集成了粗流和细流的输出,从易到难学习表情表示。在预训练的 KTN 和现有学习经验的监督下,STSN 可以最大限度地发挥潜在性能,并压缩原始的 KTN。在公共基准上的广泛实验表明,与目前最先进的框架相比,该方法在 RAF-DB 上达到了 88.07%、在 AffectNet 上达到了 63.97%、在 FERPlus 上达到了 90.49%的优异性能。