基于具有文本-音频融合特征的提示学习的对话中的多模态情感识别。

Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features.

作者信息

Wu Yuezhou, Zhang Siling, Li Pengfei

机构信息

School of Computer Science, Civil Aviation Flight University of China, Guanghan, 618307, China.

出版信息

Sci Rep. 2025 Mar 14;15(1):8855. doi: 10.1038/s41598-025-89758-8.

DOI:10.1038/s41598-025-89758-8

PMID:40087340

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11909257/

Abstract

With the widespread adoption of interactive machine applications, Emotion Recognition in Conversations (ERC) technology has garnered increasing attention. Although existing methods have improved recognition accuracy by integrating structured data, language barriers and the scarcity of non-English resources limit their cross-lingual applications. In light of this, the MERC-PLTAF method proposed in this paper innovatively focuses on multimodal emotion recognition in conversations, aiming to overcome the limitations of single modality and language barriers through refined feature extraction and a sophisticated cross-fusion strategy. We conducted extensive validation on multiple English and Chinese datasets, and the experimental results demonstrate that this method not only significantly improves emotion recognition accuracy but also exhibits exceptional performance on the Chinese M3ED dataset, paving a new path for cross-lingual emotion recognition. This research not only advances the boundaries of emotion recognition technology but also lays a solid theoretical foundation and practical framework for creating more intelligent and human-centric interactive experiences.

摘要

随着交互式机器应用的广泛采用，对话中的情感识别（ERC）技术受到了越来越多的关注。尽管现有方法通过整合结构化数据提高了识别准确率，但语言障碍和非英语资源的稀缺限制了它们的跨语言应用。鉴于此，本文提出的MERC-PLTAF方法创新性地专注于对话中的多模态情感识别，旨在通过精细的特征提取和复杂的交叉融合策略克服单模态的局限性和语言障碍。我们在多个英文和中文数据集上进行了广泛的验证，实验结果表明，该方法不仅显著提高了情感识别准确率，而且在中国M3ED数据集上表现出色，为跨语言情感识别开辟了一条新路径。这项研究不仅拓展了情感识别技术的边界，也为创造更智能、以人类为中心的交互体验奠定了坚实的理论基础和实践框架。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于具有文本-音频融合特征的提示学习的对话中的多模态情感识别。

Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于具有文本-音频融合特征的提示学习的对话中的多模态情感识别。

Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献