基于区域双重注意力的视频情绪识别。

Region Dual Attention-Based Video Emotion Recognition.

机构信息

School of Computing, Henan University of Engineering, Zhengzhou, China.

出版信息

Comput Intell Neurosci. 2022 Jun 15;2022:6096325. doi: 10.1155/2022/6096325. eCollection 2022.

DOI:10.1155/2022/6096325

PMID:35755752

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9217593/

Abstract

To solve the emotional differences between different regions of the video frame and make use of the interrelationship between different regions, a region dual attention-based video emotion recognition method (RDAM) is proposed. RDAM takes as input video frame sequences and learns a discriminatory video emotion representation that can make full use of the emotional differences of different regions and the interrelationship between regions. Specifically, we construct two parallel attention modules: one is the regional location attention module, which generates a weight value for each feature region to identify the relative importance of different regions. Based on the weight, the emotion feature that can perceive the emotional sensitive region is generated. The other is the regional relationship attention module, which generates a region relation matrix that represents the interrelationship of different regions of a video frame. Based on the region relation matrix, the emotion feature that can perceive interrelationship between different regions is generated. The outputs of these two attention modules are fused to produce the emotional features of video frames. Then, the features of video frame sequences are fused by attention-based fusion network, and the final emotion feature of the video is produced. The experimental results on the video emotion recognition data sets show that the proposed method outperforms the other related works.

摘要

为了解决视频帧不同区域之间的情感差异，并利用不同区域之间的相互关系，提出了一种基于区域双重注意力的视频情感识别方法（RDAM）。RDAM 以视频帧序列作为输入，学习一种具有判别力的视频情感表示，可以充分利用不同区域的情感差异和区域之间的相互关系。具体来说，我们构建了两个并行的注意力模块：一个是区域位置注意力模块，它为每个特征区域生成一个权重值，以识别不同区域的相对重要性。基于权重，生成能够感知情感敏感区域的情感特征。另一个是区域关系注意力模块，它生成一个表示视频帧不同区域之间相互关系的区域关系矩阵。基于区域关系矩阵，生成能够感知不同区域之间相互关系的情感特征。这两个注意力模块的输出融合在一起，生成视频帧的情感特征。然后，通过基于注意力的融合网络融合视频帧序列的特征，生成视频的最终情感特征。在视频情感识别数据集上的实验结果表明，所提出的方法优于其他相关工作。