Zhou Qingping
School of Software, Hunan College of Information, Chang sha, Hunan Province, China.
PLoS One. 2025 Sep 5;20(9):e0330632. doi: 10.1371/journal.pone.0330632. eCollection 2025.
This research has proposed a new Emotion Recognition in Conversation (ERC) model known as Hierarchical Graph Learning for Emotion Recognition (HGLER), built to go beyond the existing approaches that find it difficult to request long-distance context and interaction across different data types. Rather than simply mixing different kinds of information, as is the case with traditional methods, HGLER uses a 2-part graph technique whereby conversations are represented in a 2-fold manner: one aimed at illustrating how various parts of the conversation relate and another for enhancing learning from various types of data. This dual-graph system can represent multimodal data value for value by exploiting the benefits of each type of data yet tracking their interactions. The HGLER model was applied to two widely used datasets, IEMOCAP and MELD, with many varieties of information, texts, pictures, or sounds, hence, to see to what extent the model can understand emotions in conversations. Preprocessing methods common in practice were done to make things consistent, and the datasets were set aside for training, validation, and testing informed by previous works. The model was tested using two standard datasets, including IEMOCAP and MELD. On IEMOCAP, HGLER posted an F1-score of 96.36% and accuracy of 96.28%; on MELD, it posted an F1-score of 96.82% and accuracy of 93.68%, surpassing some state-of-the-art methods. The model also showed some superb performance in terms of its convergence, generalization, and convergence stability during training. These findings demonstrate that hierarchical graph-based learning can be applied in enhancing emotional comprehension in systems dealing with several forms of information in handling conversations. However, slight changes in validation loss observed suggest there are areas of model stability and generalization to be improved on. These results validate that using hierarchical graph-based learning in multimodal ERC does well and promises to enhance emotional understanding in conversational AI systems.
本研究提出了一种新的对话情感识别(ERC)模型,即用于情感识别的层次图学习(HGLER),其构建目的是超越现有方法,现有方法难以处理长距离上下文以及跨不同数据类型的交互。与传统方法简单混合不同类型信息的情况不同,HGLER使用一种两部分图技术,通过这种技术,对话以两种方式呈现:一种旨在说明对话的各个部分如何关联,另一种用于增强从各种类型数据中学习的能力。这种双图系统可以通过利用每种数据类型的优势并跟踪它们的交互来逐值表示多模态数据值。HGLER模型被应用于两个广泛使用的数据集,即IEMOCAP和MELD,这两个数据集包含多种信息,如文本、图片或声音,因此,旨在观察该模型在多大程度上能够理解对话中的情感。实践中常见的预处理方法被用于使数据保持一致,并且根据先前的工作将数据集留作训练、验证和测试之用。该模型使用包括IEMOCAP和MELD在内的两个标准数据集进行测试。在IEMOCAP上,HGLER的F1分数为96.36%,准确率为96.28%;在MELD上,它的F1分数为96.82%,准确率为93.68%,超过了一些现有最先进的方法。该模型在训练期间的收敛性、泛化性和收敛稳定性方面也表现出一些出色的性能。这些发现表明,基于层次图的学习可应用于增强处理对话中多种信息形式的系统的情感理解。然而,观察到的验证损失的轻微变化表明,模型的稳定性和泛化性仍有改进的空间。这些结果证实,在多模态ERC中使用基于层次图的学习效果良好,并有望增强对话式人工智能系统中的情感理解。