IEEE Trans Cybern. 2019 Mar;49(3):839-847. doi: 10.1109/TCYB.2017.2788081. Epub 2018 Jan 30.
In this paper, we propose a novel deep learning framework, called spatial-temporal recurrent neural network (STRNN), to integrate the feature learning from both spatial and temporal information of signal sources into a unified spatial-temporal dependency model. In STRNN, to capture those spatially co-occurrent variations of human emotions, a multidirectional recurrent neural network (RNN) layer is employed to capture long-range contextual cues by traversing the spatial regions of each temporal slice along different directions. Then a bi-directional temporal RNN layer is further used to learn the discriminative features characterizing the temporal dependencies of the sequences, where sequences are produced from the spatial RNN layer. To further select those salient regions with more discriminative ability for emotion recognition, we impose sparse projection onto those hidden states of spatial and temporal domains to improve the model discriminant ability. Consequently, the proposed two-layer RNN model provides an effective way to make use of both spatial and temporal dependencies of the input signals for emotion recognition. Experimental results on the public emotion datasets of electroencephalogram and facial expression demonstrate the proposed STRNN method is more competitive over those state-of-the-art methods.
在本文中,我们提出了一种新的深度学习框架,称为时空递归神经网络(STRNN),将信号源的空间和时间信息的特征学习整合到一个统一的时空依赖模型中。在 STRNN 中,为了捕获人类情绪的空间共变变化,采用了一个多方向递归神经网络(RNN)层,通过沿着不同方向遍历每个时间片的空间区域来捕获长程上下文线索。然后,进一步使用双向时间 RNN 层来学习序列的时间依赖特征,这些序列是由空间 RNN 层生成的。为了进一步选择那些对情绪识别具有更具判别能力的显著区域,我们对空间和时间域的隐藏状态施加稀疏投影,以提高模型的判别能力。因此,所提出的两层 RNN 模型为利用输入信号的空间和时间依赖关系进行情绪识别提供了一种有效的方法。在公开的脑电图和面部表情情绪数据集上的实验结果表明,所提出的 STRNN 方法比那些最先进的方法更具竞争力。