Fu Changzeng, Qian Fengkui, Su Kaifeng, Su Yikai, Wang Ze, Shi Jiaqi, Liu Zhigang, Liu Chaoran, Ishi Carlos Toshinori
Northeastern University, China; Osaka University, Japan; RIKEN, Japan; Hebei Key Laboratory of Marine Perception Network and Data Processing, China.
Northeastern University, China.
Neural Netw. 2025 Jan;181:106764. doi: 10.1016/j.neunet.2024.106764. Epub 2024 Sep 28.
Emotion recognition in conversation (ERC) is a vital task that requires deciphering human emotions through analysis of contextual and multimodal information. However, extant research on ERC concentrates predominantly on investigating multimodal fusion while overlooking the model's constraints in dealing with unimodal representation discrepancy and speaker dependencies. To address the aforementioned problems, this paper proposes a Hierarchical decision fusion-based Local-Global Graph Neural Network for multimodal ERC (HiMul-LGG). HiMul-LGG employs a hierarchical decision fusion strategy to ensure feature alignment across modalities. Moreover, HiMul-LGG also adopts a local-global graph neural network architecture to reinforce inter-modality and intra-modality speaker dependency. Additionally, HiMul-LGG utilizes a cross-modal multi-head attention mechanism to promote interplay between modalities. We evaluate HiMul-LGG on two emotion recognition datasets, IEMOCAP and MELD, where HiMul-LGG outperforms existing methods. The results of the ablation study also imply the effectiveness of the proposed hierarchical decision fusion strategy and local-global structure of Graph construction.
对话中的情感识别(ERC)是一项至关重要的任务,需要通过分析上下文和多模态信息来解读人类情感。然而,现有的关于ERC的研究主要集中在调查多模态融合上,而忽略了模型在处理单模态表示差异和说话者依赖性方面的限制。为了解决上述问题,本文提出了一种基于分层决策融合的局部-全局图神经网络用于多模态ERC(HiMul-LGG)。HiMul-LGG采用分层决策融合策略来确保跨模态的特征对齐。此外,HiMul-LGG还采用局部-全局图神经网络架构来加强跨模态和模态内的说话者依赖性。此外,HiMul-LGG利用跨模态多头注意力机制来促进模态之间的相互作用。我们在两个情感识别数据集IEMOCAP和MELD上对HiMul-LGG进行了评估,HiMul-LGG的表现优于现有方法。消融研究的结果也表明了所提出的分层决策融合策略和图构建的局部-全局结构的有效性。