用于面部表情识别的混合注意力级联网络。

Hybrid Attention Cascade Network for Facial Expression Recognition.

机构信息

National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China.

National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China.

出版信息

Sensors (Basel). 2021 Mar 12;21(6):2003. doi: 10.3390/s21062003.

DOI:10.3390/s21062003

PMID:33809038

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8002145/

Abstract

As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment.

摘要

作为 EmotiW（野外情感识别挑战）的一个子挑战，如何提高在具有各种约束条件（包括不均匀光照、头部偏斜和面部姿势）的情感识别任务中在 AFEW（野外表演面部表情）数据集上的性能是一个热门基准。在本文中，我们提出了一种方便的面部表情识别级联网络，包括空间特征提取、混合注意力和时间特征提取。首先，在视频序列中，检测到每一帧中的人脸，并提取相应的人脸 ROI（感兴趣区域），以获取人脸图像。然后，根据图像中面部特征点的位置信息对齐每一帧中的人脸图像。其次，将对齐后的人脸图像输入到残差神经网络中，以提取与人脸图像对应的面部表情的空间特征。将空间特征输入到混合注意力模块，以获得面部表情的融合特征。最后，将融合特征输入门控循环单元，以提取面部表情的时间特征。将时间特征输入全连接层，对面部表情进行分类和识别。使用 CK+（扩展 Cohn Kanade）、Oulu-CASIA（中国科学院自动化研究所）和 AFEW 数据集进行的实验分别获得了 98.46%、87.31%和 53.44%的识别准确率。这表明，所提出的方法不仅实现了与最先进方法相当的竞争性能，而且在 AFEW 数据集上的性能提高了 2%以上，证明了在自然环境中对面部表情识别的显著优势。