• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于语音情感识别的二元融合网络中的互相关注意因素

Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition.

作者信息

Gu Yue, Lyu Xinyu, Sun Weijia, Li Weitian, Chen Shuhong, Li Xinyu, Ivan Marsic

机构信息

Rutgers University.

Amazon Inc., Rutgers University.

出版信息

Proc ACM Int Conf Multimed. 2019 Oct;2019:157-166. doi: 10.1145/3343031.3351039.

DOI:10.1145/3343031.3351039
PMID:32201866
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7085887/
Abstract

Emotion recognition in dyadic communication is challenging because: 1. Extracting informative modality-specific representations requires disparate feature extractor designs due to the heterogenous input data formats. 2. How to effectively and efficiently fuse unimodal features and learn associations between dyadic utterances are critical to the model generalization in actual scenario. 3. Disagreeing annotations prevent previous approaches from precisely predicting emotions in context. To address the above issues, we propose an efficient dyadic fusion network that only relies on an attention mechanism to select representative vectors, fuse modality-specific features, and learn the sequence information. Our approach has three distinct characteristics: 1. Instead of using a recurrent neural network to extract temporal associations as in most previous research, we introduce multiple sub-view attention layers to compute the relevant dependencies among sequential utterances; this significantly improves model efficiency. 2. To improve fusion performance, we design a learnable mutual correlation factor inside each attention layer to compute associations across different modalities. 3. To overcome the label disagreement issue, we embed the labels from all annotators into a k-dimensional vector and transform the categorical problem into a regression problem; this method provides more accurate annotation information and fully uses the entire dataset. We evaluate the proposed model on two published multimodal emotion recognition datasets: IEMOCAP and MELD. Our model significantly outperforms previous state-of-the-art research by 3.8%-7.5% accuracy, using a more efficient model.

摘要

在二元交流中的情感识别具有挑战性,原因如下:1. 由于输入数据格式的异质性,提取特定模态的信息表示需要不同的特征提取器设计。2. 如何有效且高效地融合单模态特征并学习二元话语之间的关联,对于实际场景中的模型泛化至关重要。3. 注释不一致使得先前的方法无法在上下文中精确预测情感。为了解决上述问题,我们提出了一种高效的二元融合网络,该网络仅依赖注意力机制来选择代表性向量、融合特定模态特征并学习序列信息。我们的方法具有三个显著特点:1. 与大多数先前研究不同,我们不是使用递归神经网络来提取时间关联,而是引入多个子视图注意力层来计算连续话语之间的相关依赖关系;这显著提高了模型效率。2. 为了提高融合性能,我们在每个注意力层内部设计了一个可学习的互相关因子,以计算不同模态之间的关联。3. 为了克服标签不一致问题,我们将所有注释者的标签嵌入到一个k维向量中,并将分类问题转化为回归问题;这种方法提供了更准确的注释信息,并充分利用了整个数据集。我们在两个已发表的多模态情感识别数据集IEMOCAP和MELD上评估了所提出的模型。我们的模型使用更高效的模型,准确率显著超过先前的最先进研究3.8%-7.5%。

相似文献

1
Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition.用于语音情感识别的二元融合网络中的互相关注意因素
Proc ACM Int Conf Multimed. 2019 Oct;2019:157-166. doi: 10.1145/3343031.3351039.
2
Multimodal transformer augmented fusion for speech emotion recognition.用于语音情感识别的多模态变压器增强融合
Front Neurorobot. 2023 May 22;17:1181598. doi: 10.3389/fnbot.2023.1181598. eCollection 2023.
3
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
4
Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别
Front Psychol. 2023 Jan 9;13:1075624. doi: 10.3389/fpsyg.2022.1075624. eCollection 2022.
5
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.
6
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
7
Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion.基于级联多通道和分层融合的多模态情绪识别。
Comput Intell Neurosci. 2023 Jan 5;2023:9645611. doi: 10.1155/2023/9645611. eCollection 2023.
8
Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features.基于提取的面部和语音特征的注意力融合的多模态情感检测。
Sensors (Basel). 2023 Jun 9;23(12):5475. doi: 10.3390/s23125475.
9
GCF-Net: global-aware cross-modal feature fusion network for speech emotion recognition.GCF-Net:用于语音情感识别的全局感知跨模态特征融合网络
Front Neurosci. 2023 May 4;17:1183132. doi: 10.3389/fnins.2023.1183132. eCollection 2023.
10
Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network.基于深度神经网络异构特征统一的语音情感识别
Sensors (Basel). 2019 Jun 18;19(12):2730. doi: 10.3390/s19122730.

引用本文的文献

1
CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French.CMU-MOSEAS:一个用于西班牙语、葡萄牙语、德语和法语的多模态语言数据集。
Proc Conf Empir Methods Nat Lang Process. 2020 Nov;2020:1801-1812. doi: 10.18653/v1/2020.emnlp-main.141.

本文引用的文献

1
Multi-attention Recurrent Network for Human Communication Comprehension.用于人类交流理解的多注意力循环网络。
Proc AAAI Conf Artif Intell. 2018 Feb;2018:5642-5649.
2
Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder.使用具有分层编码器-解码器的注意力多模态网络进行人类对话分析
Proc ACM Int Conf Multimed. 2018 Oct;2018:537-545. doi: 10.1145/3240508.3240714.
3
DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE.用于口语情感识别的深度多模态学习
Proc IEEE Int Conf Acoust Speech Signal Process. 2018 Apr;2018:5079-5083. doi: 10.1109/ICASSP.2018.8462440. Epub 2018 Sep 13.
4
Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment.基于词级对齐的分层注意力策略的多模态情感分析
Proc Conf Assoc Comput Linguist Meet. 2018 Jul;2018:2225-2235.
5
Hybrid Attention based Multimodal Network for Spoken Language Classification.基于混合注意力的多模态口语分类网络
Proc Conf Assoc Comput Linguist Meet. 2018 Aug;2018:2379-2390.
6
Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering.通过粒子滤波结合自发对话中情感的视频、音频和词汇指标。
Proc ACM Int Conf Multimodal Interact. 2012;2012:485-492. doi: 10.1145/2388676.2388781.