Meng Hongying, Bianchi-Berthouze Nadia, Deng Yangdong, Cheng Jinkuang, Cosmas John P
IEEE Trans Cybern. 2016 Apr;46(4):916-29. doi: 10.1109/TCYB.2015.2418092. Epub 2015 Apr 21.
Automatic continuous affective state prediction from naturalistic facial expression is a very challenging research topic but very important in human-computer interaction. One of the main challenges is modeling the dynamics that characterize naturalistic expressions. In this paper, a novel two-stage automatic system is proposed to continuously predict affective dimension values from facial expression videos. In the first stage, traditional regression methods are used to classify each individual video frame, while in the second stage, a time-delay neural network (TDNN) is proposed to model the temporal relationships between consecutive predictions. The two-stage approach separates the emotional state dynamics modeling from an individual emotional state prediction step based on input features. In doing so, the temporal information used by the TDNN is not biased by the high variability between features of consecutive frames and allows the network to more easily exploit the slow changing dynamics between emotional states. The system was fully tested and evaluated on three different facial expression video datasets. Our experimental results demonstrate that the use of a two-stage approach combined with the TDNN to take into account previously classified frames significantly improves the overall performance of continuous emotional state estimation in naturalistic facial expressions. The proposed approach has won the affect recognition sub-challenge of the Third International Audio/Visual Emotion Recognition Challenge.
从自然主义面部表情中自动连续预测情感状态是一个极具挑战性的研究课题,但在人机交互中却非常重要。主要挑战之一是对表征自然主义表情的动态变化进行建模。本文提出了一种新颖的两阶段自动系统,用于从面部表情视频中连续预测情感维度值。在第一阶段,使用传统回归方法对每个视频帧进行分类,而在第二阶段,提出了一种时延神经网络(TDNN)来对连续预测之间的时间关系进行建模。这种两阶段方法将情感状态动态建模与基于输入特征的个体情感状态预测步骤分离开来。这样一来,TDNN使用的时间信息不会受到连续帧特征之间高变异性的影响,从而使网络能够更轻松地利用情感状态之间缓慢变化的动态信息。该系统在三个不同的面部表情视频数据集上进行了全面测试和评估。我们的实验结果表明,采用两阶段方法并结合TDNN来考虑先前分类的帧,显著提高了自然主义面部表情中连续情感状态估计的整体性能。所提出的方法在第三届国际视听情感识别挑战赛的情感识别子挑战中获胜。