Department of Electrical and Computer Engineering, University of Ulsan, Ulsan 44610, Korea.
Sensors (Basel). 2020 May 5;20(9):2639. doi: 10.3390/s20092639.
Facial expression recognition (FER) is a challenging problem in the fields of pattern recognition and computer vision. The recent success of convolutional neural networks (CNNs) in object detection and object segmentation tasks has shown promise in building an automatic deep CNN-based FER model. However, in real-world scenarios, performance degrades dramatically owing to the great diversity of factors unrelated to facial expressions, and due to a lack of training data and an intrinsic imbalance in the existing facial emotion datasets. To tackle these problems, this paper not only applies deep transfer learning techniques, but also proposes a novel loss function called weighted-cluster loss, which is used during the fine-tuning phase. Specifically, the weighted-cluster loss function simultaneously improves the intra-class compactness and the inter-class separability by learning a class center for each emotion class. It also takes the imbalance in a facial expression dataset into account by giving each emotion class a weight based on its proportion of the total number of images. In addition, a recent, successful deep CNN architecture, pre-trained in the task of face identification with the VGGFace2 database from the Visual Geometry Group at Oxford University, is employed and fine-tuned using the proposed loss function to recognize eight basic facial emotions from the AffectNet database of facial expression, valence, and arousal computing in the wild. Experiments on an AffectNet real-world facial dataset demonstrate that our method outperforms the baseline CNN models that use either weighted-softmax loss or center loss.
面部表情识别(FER)是模式识别和计算机视觉领域的一个具有挑战性的问题。卷积神经网络(CNNs)在目标检测和目标分割任务中的最新成功表明,在构建基于自动深度 CNN 的 FER 模型方面具有很大的潜力。然而,在实际场景中,由于与面部表情无关的因素的多样性以及由于缺乏训练数据和现有面部情绪数据集的内在不平衡,性能会急剧下降。为了解决这些问题,本文不仅应用了深度迁移学习技术,还提出了一种新的损失函数,称为加权聚类损失,该损失在微调阶段使用。具体来说,加权聚类损失函数通过为每个情绪类学习一个类中心,同时提高了类内紧密度和类间可分离性。它还考虑到了面部表情数据集的不平衡,根据每个情绪类在图像总数中的比例为其分配一个权重。此外,使用了最近成功的深度 CNN 架构,该架构在牛津大学视觉几何组的 VGGFace2 数据库中进行了人脸识别任务的预训练,并使用所提出的损失函数进行微调,以从面部表情、效价和唤醒度的野生计算的 AffectNet 数据库中识别出八种基本面部表情。在 AffectNet 真实面部数据集上的实验表明,我们的方法优于使用加权 softmax 损失或中心损失的基线 CNN 模型。