Martinez Aleix, Du Shichuan
Department of Electrical and Computer Engineering, The Ohio State University, 2015 Neil Avenue, Columbus, OH 43210, USA.
J Mach Learn Res. 2012 May 1;13:1589-1608.
In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion-the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in studies of human perception, social interactions and disorders.
在认知科学和神经科学领域,有两种主导模型描述人类如何感知和分类情感面部表情——连续模型和范畴模型。连续模型将每种情感面部表情定义为面部空间中的一个特征向量。例如,该模型解释了情感表情如何能以不同强度被看到。相比之下,范畴模型由分类器组成,每个分类器都针对特定的情感类别进行调整。除其他发现外,该模型解释了为什么在快乐脸和惊讶脸之间的变形序列中的图像会被感知为要么是快乐要么是惊讶,而不是介于两者之间的某种表情。虽然连续模型在解释后一个发现时面临更大困难,但范畴模型在解释表情如何以不同强度或模式被识别方面则表现欠佳。最重要的是,两种模型在解释人们如何识别情感类别组合方面都存在问题,比如惊喜、愤怒的惊讶和单纯的惊讶。为了解决这些问题,在过去几年里,我们致力于一个经过修订的模型,该模型为认知科学和神经科学文献中报道的结果提供了合理依据。这个模型由不同的连续空间组成。通过线性组合这些面部空间,可以识别多种(复合)情感类别。这些空间的维度大多显示为构型的。根据这个模型,情感面部表情分类的主要任务是精确、详细地检测面部地标,而不是识别。我们概述了支持该模型的文献,展示了由此产生的模型如何可用于构建情感面部表情识别算法,并向机器学习和计算机视觉研究人员提出研究方向,以不断推动这些领域的技术发展。我们还讨论了该模型如何有助于人类感知、社会互动和障碍的研究。