Fu Chenghao, Yang Wenzhong, Chen Danny, Wei Fuyuan
School of Information Science and Engineering, Xinjiang University, Urumqi 830017, China.
Xinjiang Key Laboratory of Multilingual Information Technology, Xinjiang University, Urumqi 830017, China.
Entropy (Basel). 2023 Jul 14;25(7):1064. doi: 10.3390/e25071064.
Micro-expressions are the small, brief facial expression changes that humans momentarily show during emotional experiences, and their data annotation is complicated, which leads to the scarcity of micro-expression data. To extract salient and distinguishing features from a limited dataset, we propose an attention-based multi-scale, multi-modal, multi-branch flow network to thoroughly learn the motion information of micro-expressions by exploiting the attention mechanism and the complementary properties between different optical flow information. First, we extract optical flow information (horizontal optical flow, vertical optical flow, and optical strain) based on the onset and apex frames of micro-expression videos, and each branch learns one kind of optical flow information separately. Second, we propose a multi-scale fusion module to extract more prosperous and more stable feature expressions using spatial attention to focus on locally important information at each scale. Then, we design a multi-optical flow feature reweighting module to adaptively select features for each optical flow separately by channel attention. Finally, to better integrate the information of the three branches and to alleviate the problem of uneven distribution of micro-expression samples, we introduce a logarithmically adjusted prior knowledge weighting loss. This loss function weights the prediction scores of samples from different categories to mitigate the negative impact of category imbalance during the classification process. The effectiveness of the proposed model is demonstrated through extensive experiments and feature visualization on three benchmark datasets (CASMEII, SAMM, and SMIC), and its performance is comparable to that of state-of-the-art methods.
微表情是人类在情感体验过程中瞬间展现出的微小、短暂的面部表情变化,并且其数据标注复杂,这导致了微表情数据的稀缺。为了从有限的数据集中提取显著且有区分性的特征,我们提出了一种基于注意力的多尺度、多模态、多分支流网络,通过利用注意力机制以及不同光流信息之间的互补特性来全面学习微表情的运动信息。首先,我们基于微表情视频的起始帧和顶点帧提取光流信息(水平光流、垂直光流和光应变),并且每个分支分别学习一种光流信息。其次,我们提出一个多尺度融合模块,利用空间注意力聚焦于每个尺度上局部重要的信息,以提取更丰富、更稳定的特征表示。然后,我们设计了一个多光流特征重新加权模块,通过通道注意力为每个光流分别自适应地选择特征。最后,为了更好地整合三个分支的信息并缓解微表情样本分布不均衡的问题,我们引入了对数调整的先验知识加权损失。这个损失函数对来自不同类别的样本的预测分数进行加权,以减轻分类过程中类别不平衡的负面影响。通过在三个基准数据集(CASMEII、SAMM和SMIC)上进行广泛的实验和特征可视化,验证了所提出模型的有效性,并且其性能与当前最先进的方法相当。