• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多层语义融合的跨模态情感识别研究

Research on cross-modal emotion recognition based on multi-layer semantic fusion.

作者信息

Xu Zhijing, Gao Yang

机构信息

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.

出版信息

Math Biosci Eng. 2024 Jan 17;21(2):2488-2514. doi: 10.3934/mbe.2024110.

DOI:10.3934/mbe.2024110
PMID:38454693
Abstract

Multimodal emotion analysis involves the integration of information from various modalities to better understand human emotions. In this paper, we propose the Cross-modal Emotion Recognition based on multi-layer semantic fusion (CM-MSF) model, which aims to leverage the complementarity of important information between modalities and extract advanced features in an adaptive manner. To achieve comprehensive and rich feature extraction from multimodal sources, considering different dimensions and depth levels, we design a parallel deep learning algorithm module that focuses on extracting features from individual modalities, ensuring cost-effective alignment of extracted features. Furthermore, a cascaded cross-modal encoder module based on Bidirectional Long Short-Term Memory (BILSTM) layer and Convolutional 1D (ConV1d) is introduced to facilitate inter-modal information complementation. This module enables the seamless integration of information across modalities, effectively addressing the challenges associated with signal heterogeneity. To facilitate flexible and adaptive information selection and delivery, we design the Mask-gated Fusion Networks (MGF-module), which combines masking technology with gating structures. This approach allows for precise control over the information flow of each modality through gating vectors, mitigating issues related to low recognition accuracy and emotional misjudgment caused by complex features and noisy redundant information. The CM-MSF model underwent evaluation using the widely recognized multimodal emotion recognition datasets CMU-MOSI and CMU-MOSEI. The experimental findings illustrate the exceptional performance of the model, with binary classification accuracies of 89.1% and 88.6%, as well as F1 scores of 87.9% and 88.1% on the CMU-MOSI and CMU-MOSEI datasets, respectively. These results unequivocally validate the effectiveness of our approach in accurately recognizing and classifying emotions.

摘要

多模态情感分析涉及整合来自各种模态的信息,以更好地理解人类情感。在本文中,我们提出了基于多层语义融合的跨模态情感识别(CM-MSF)模型,其目的是利用模态之间重要信息的互补性,并以自适应方式提取高级特征。为了从多模态源实现全面且丰富的特征提取,考虑到不同的维度和深度级别,我们设计了一个并行深度学习算法模块,该模块专注于从各个模态中提取特征,确保提取特征的经济高效对齐。此外,引入了基于双向长短期记忆(BILSTM)层和一维卷积(ConV1d)的级联跨模态编码器模块,以促进模态间信息互补。该模块能够实现跨模态信息的无缝集成,有效解决与信号异质性相关的挑战。为了促进灵活且自适应的信息选择与传递,我们设计了掩码门控融合网络(MGF-模块),它将掩码技术与门控结构相结合。这种方法允许通过门控向量精确控制每个模态的信息流,减轻由复杂特征和噪声冗余信息导致的识别准确率低和情感误判等问题。CM-MSF模型使用广泛认可的多模态情感识别数据集CMU-MOSI和CMU-MOSEI进行了评估。实验结果表明该模型具有卓越的性能,在CMU-MOSI和CMU-MOSEI数据集上的二元分类准确率分别为89.1%和88.6%,F1分数分别为87.9%和88.1%。这些结果明确验证了我们的方法在准确识别和分类情感方面的有效性。

相似文献

1
Research on cross-modal emotion recognition based on multi-layer semantic fusion.基于多层语义融合的跨模态情感识别研究
Math Biosci Eng. 2024 Jan 17;21(2):2488-2514. doi: 10.3934/mbe.2024110.
2
Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion.基于级联多通道和分层融合的多模态情绪识别。
Comput Intell Neurosci. 2023 Jan 5;2023:9645611. doi: 10.1155/2023/9645611. eCollection 2023.
3
Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning.基于深度学习的语音表达多模态融合情感识别方法
Front Neurorobot. 2021 Jul 9;15:697634. doi: 10.3389/fnbot.2021.697634. eCollection 2021.
4
A novel transformer autoencoder for multi-modal emotion recognition with incomplete data.一种基于新型Transformer 自编码器的多模态情感识别方法,适用于不完全数据。
Neural Netw. 2024 Apr;172:106111. doi: 10.1016/j.neunet.2024.106111. Epub 2024 Jan 6.
5
Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features.基于提取的面部和语音特征的注意力融合的多模态情感检测。
Sensors (Basel). 2023 Jun 9;23(12):5475. doi: 10.3390/s23125475.
6
A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals.一种用于从脑电图和眼动信号中进行多模态情感识别的新型特征融合网络。
Front Neurosci. 2023 Aug 3;17:1234162. doi: 10.3389/fnins.2023.1234162. eCollection 2023.
7
Cross-modal credibility modelling for EEG-based multimodal emotion recognition.基于 EEG 的多模态情感识别的跨模态可信度建模。
J Neural Eng. 2024 Apr 11;21(2). doi: 10.1088/1741-2552/ad3987.
8
LGCCT: A Light Gated and Crossed Complementation Transformer for Multimodal Speech Emotion Recognition.LGCCT:一种用于多模态语音情感识别的光控交叉互补变换器
Entropy (Basel). 2022 Jul 21;24(7):1010. doi: 10.3390/e24071010.
9
A Parallel Multi-Modal Factorized Bilinear Pooling Fusion Method Based on the Semi-Tensor Product for Emotion Recognition.一种基于半张量积的并行多模态因子分解双线性池化融合情感识别方法。
Entropy (Basel). 2022 Dec 16;24(12):1836. doi: 10.3390/e24121836.
10
A Multi-Modal Convolutional Neural Network Model for Intelligent Analysis of the Influence of Music Genres on Children's Emotions.一种用于智能分析音乐流派对儿童情绪影响的多模态卷积神经网络模型。
Comput Intell Neurosci. 2022 Jul 19;2022:4957085. doi: 10.1155/2022/4957085. eCollection 2022.