IEEE Trans Image Process. 2022;31:5189-5202. doi: 10.1109/TIP.2022.3193749. Epub 2022 Aug 4.
Visual Emotion Analysis (VEA), which aims to predict people's emotions towards different visual stimuli, has become an attractive research topic recently. Rather than a single label classification task, it is more rational to regard VEA as a Label Distribution Learning (LDL) problem by voting from different individuals. Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process. In psychology, the Object-Appraisal-Emotion model has demonstrated that each individual's emotion is affected by his/her subjective appraisal, which is further formed by the affective memory. Inspired by this, we propose a novel Subjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution. To depict the diversity in crowd voting process, we first propose the Subjectivity Appraising with multiple branches, where each branch simulates the emotion evocation process of a specific individual. Specifically, we construct the affective memory with an attention-based mechanism to preserve each individual's unique emotional experience. A subjectivity loss is further proposed to guarantee the divergence between different individuals. Moreover, we propose the Subjectivity Matching with a matching loss, aiming at assigning unordered emotion labels to ordered individual predictions in a one-to-one correspondence with the Hungarian algorithm. Extensive experiments and comparisons are conducted on public visual emotion distribution datasets, and the results demonstrate that the proposed SAMNet consistently outperforms the state-of-the-art methods. Ablation study verifies the effectiveness of our method and visualization proves its interpretability.
视觉情感分析(VEA)旨在预测人们对不同视觉刺激的情感,最近已成为一个极具吸引力的研究课题。与其将其视为单一标签分类任务,不如将其视为通过不同个体投票的标签分布学习(LDL)问题更为合理。现有的方法通常在统一的网络中预测视觉情感分布,而忽略了其众包投票过程中的固有主观性。在心理学中,客体-评价-情感模型已经证明,每个人的情感都受到其主观评价的影响,而主观评价则是由情感记忆进一步形成的。受此启发,我们提出了一种新颖的主观性评价与匹配网络(SAMNet)来研究视觉情感分布中的主观性。为了描述众包投票过程中的多样性,我们首先提出了多分支的主观性评价,其中每个分支模拟特定个体的情感唤起过程。具体来说,我们构建了基于注意力机制的情感记忆,以保留每个个体独特的情感体验。此外,我们还提出了主观性匹配的匹配损失,旨在通过匈牙利算法的一对一对应关系,将无序的情感标签分配给有序的个体预测。我们在公共视觉情感分布数据集上进行了广泛的实验和比较,结果表明,所提出的 SAMNet 始终优于最先进的方法。消融研究验证了我们方法的有效性,可视化证明了其可解释性。