Wu Shichao, Zhou Lei, Hu Zhengxi, Liu Jingtai
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3725-3739. doi: 10.1109/TNNLS.2022.3196831. Epub 2024 Feb 29.
For a better intention inference, we often try to figure out the emotional states of other people in social communications. Many studies on affective computing have been carried out to infer emotions through perceiving human states, i.e., facial expression and body posture. Such methods are skillful in a controlled environment. However, it often leads to misestimation due to the deficiency of effective inputs in unconstrained circumstances, that is, where context-aware emotion recognition appeared. We take inspiration from the advanced reasoning pattern of humans in perceived emotion recognition and propose the hierarchical context-based emotion recognition method with scene graphs. We propose to extract three contexts from the image, i.e., the entity context, the global context, and the scene context. The scene context contains abstract information about entity labels and their relationships. It is similar to the information processing of the human visual sensing mechanism. After that, these contexts are further fused to perform emotion recognition. We carried out a bunch of experiments on the widely used context-aware emotion datasets, i.e., CAER-S, EMOTIC, and BOdy Language Dataset (BoLD). We demonstrate that the hierarchical contexts can benefit emotion recognition by improving the accuracy of the SOTA score from 84.82% to 90.83% on CAER-S. The ablation experiments show that hierarchical contexts provide complementary information. Our method improves the F1 score of the SOTA result from 29.33% to 30.24% (C-F1) on EMOTIC. We also build the image-based emotion recognition task with BoLD-Img from BoLD and obtain a better emotion recognition score (ERS) score of 0.2153.
为了进行更准确的意图推断,我们在社交交流中常常试图弄清楚他人的情绪状态。已经开展了许多关于情感计算的研究,通过感知人类状态(即面部表情和身体姿势)来推断情绪。此类方法在受控环境中表现出色。然而,在无约束的情况下,由于有效输入的不足,即出现上下文感知情绪识别的情况,这往往会导致错误估计。我们从人类在感知情绪识别中的先进推理模式中获得灵感,提出了基于场景图的分层上下文情绪识别方法。我们建议从图像中提取三种上下文,即实体上下文、全局上下文和场景上下文。场景上下文包含有关实体标签及其关系的抽象信息。这类似于人类视觉感知机制的信息处理过程。之后,将这些上下文进一步融合以进行情绪识别。我们在广泛使用的上下文感知情绪数据集(即CAER-S、EMOTIC和身体语言数据集(BoLD))上进行了一系列实验。我们证明,分层上下文可以通过将CAER-S上的SOTA分数准确率从84.82%提高到90.83%来促进情绪识别。消融实验表明,分层上下文提供了互补信息。我们的方法将EMOTIC上的SOTA结果的F1分数从29.33%提高到30.24%(C-F1)。我们还使用来自BoLD的BoLD-Img构建了基于图像的情绪识别任务,并获得了更好的情绪识别分数(ERS)0.2153。