Department of Biomedical Engineering, Amirkabir University of Technology, 424Hafez Ave, Tehran, Iran.
Department of Biomedical Engineering, Amirkabir University of Technology, 424Hafez Ave, Tehran, Iran.
Neural Netw. 2018 Sep;105:304-315. doi: 10.1016/j.neunet.2018.05.016. Epub 2018 May 31.
Nonlinear components extracted from deep structures of bottleneck neural networks exhibit a great ability to express input space in a low-dimensional manifold. Sharing and combining the components boost the capability of the neural networks to synthesize and interpolate new and imaginary data. This synthesis is possibly a simple model of imaginations in human brain where the components are expressed in a nonlinear low dimensional manifold. The current paper introduces a novel Dynamic Deep Bottleneck Neural Network to analyze and extract three main features of videos regarding the expression of emotions on the face. These main features are identity, emotion and expression intensity that are laid in three different sub-manifolds of one nonlinear general manifold. The proposed model enjoying the advantages of recurrent networks was used to analyze the sequence and dynamics of information in videos. It is noteworthy to mention that this model also has also the potential to synthesize new videos showing variations of one specific emotion on the face of unknown subjects. Experiments on discrimination and recognition ability of extracted components showed that the proposed model has an average of 97.77% accuracy in recognition of six prominent emotions (Fear, Surprise, Sadness, Anger, Disgust, and Happiness), and 78.17% accuracy in the recognition of intensity. The produced videos revealed variations from neutral to the apex of an emotion on the face of the unfamiliar test subject which is on average 0.8 similar to reference videos in the scale of the SSIM method.
从瓶颈神经网络的深层结构中提取的非线性成分具有在低维流形中表达输入空间的强大能力。共享和组合这些成分可以提高神经网络合成和插值新的和想象的数据的能力。这种合成可能是人类大脑想象的简单模型,其中成分在非线性低维流形中表示。本文介绍了一种新颖的动态深度瓶颈神经网络,用于分析和提取有关面部表情的视频的三个主要特征。这些主要特征是身份、情绪和表情强度,它们位于一个非线性总体流形的三个不同子流形中。所提出的模型利用递归网络的优势来分析视频中的信息序列和动态。值得一提的是,该模型还具有合成新视频的潜力,这些新视频显示了未知主体面部上特定情绪的变化。对提取成分的辨别和识别能力的实验表明,所提出的模型在识别六种突出情绪(恐惧、惊讶、悲伤、愤怒、厌恶和幸福)方面的平均准确率为 97.77%,在识别强度方面的准确率为 78.17%。生成的视频显示了从中性到陌生测试对象面部情绪顶点的变化,平均在 SSIM 方法的尺度上与参考视频相似 0.8。