School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, PR China; Engineering Research Center of Embedded System Integration, Ministry of Education. Xi'an 710129, PR China; National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi'an 710129, PR China.
School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, PR China.
Neural Netw. 2024 Feb;170:337-348. doi: 10.1016/j.neunet.2023.11.033. Epub 2023 Nov 14.
Facial expression recognition (FER) in the wild is challenging due to the disturbing factors including pose variation, occlusions, and illumination variation. The attention mechanism can relieve these issues by enhancing expression-relevant information and suppressing expression-irrelevant information. However, most methods utilize the same attention mechanism on feature tensors with varying spatial and channel sizes across different network layers, disregarding the dynamically changing sizes of these tensors. To solve this issue, this paper proposes a hierarchical attention network with progressive feature fusion for FER. Specifically, first, to aggregate diverse complementary features, a diverse feature extraction module based on several feature aggregation blocks is designed to exploit both local context and global context features, both low-level and high-level features, as well as the gradient features that are robust to illumination variation. Second, to effectively fuse the above diverse features, a hierarchical attention module (HAM) is designed to progressively enhance discriminative features from key parts of the facial images and suppress task-irrelevant features from disturbing facial regions. Extensive experiments show that our model achieves the best performance among existing FER methods.
在野外的面部表情识别(FER)由于姿态变化、遮挡和光照变化等干扰因素而具有挑战性。注意力机制可以通过增强与表情相关的信息和抑制与表情无关的信息来缓解这些问题。然而,大多数方法在具有不同空间和通道大小的特征张量上使用相同的注意力机制,而忽略了这些张量的动态变化大小。为了解决这个问题,本文提出了一种用于 FER 的具有渐进式特征融合的分层注意力网络。具体来说,首先,为了聚合不同的互补特征,设计了一个基于多个特征聚合块的多样化特征提取模块,以利用局部上下文和全局上下文特征、低层次和高层次特征以及对光照变化具有鲁棒性的梯度特征。其次,为了有效地融合上述不同的特征,设计了一个分层注意力模块(HAM),以从面部图像的关键部分逐步增强判别特征,并从干扰面部区域抑制与任务无关的特征。大量实验表明,我们的模型在现有的 FER 方法中取得了最佳性能。