• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。

Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.

机构信息

College of Computer and Information Engineering, Tianjin Normal University, Tianjin, China.

GLAM - Group on Language, Audio, & Music, Imperial College London, UK.

出版信息

Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.

DOI:10.1016/j.neunet.2021.03.013
PMID:33866302
Abstract

A challenging issue in the field of the automatic recognition of emotion from speech is the efficient modelling of long temporal contexts. Moreover, when incorporating long-term temporal dependencies between features, recurrent neural network (RNN) architectures are typically employed by default. In this work, we aim to present an efficient deep neural network architecture incorporating Connectionist Temporal Classification (CTC) loss for discrete speech emotion recognition (SER). Moreover, we also demonstrate the existence of further opportunities to improve SER performance by exploiting the properties of convolutional neural networks (CNNs) when modelling contextual information. Our proposed model uses parallel convolutional layers (PCN) integrated with Squeeze-and-Excitation Network (SEnet), a system herein denoted as PCNSE, to extract relationships from 3D spectrograms across timesteps and frequencies; here, we use the log-Mel spectrogram with deltas and delta-deltas as input. In addition, a self-attention Residual Dilated Network (SADRN) with CTC is employed as a classification block for SER. To the best of the authors' knowledge, this is the first time that such a hybrid architecture has been employed for discrete SER. We further demonstrate the effectiveness of our proposed approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) and FAU-Aibo Emotion corpus (FAU-AEC). Our experimental results reveal that the proposed method is well-suited to the task of discrete SER, achieving a weighted accuracy (WA) of 73.1% and an unweighted accuracy (UA) of 66.3% on IEMOCAP, as well as a UA of 41.1% on the FAU-AEC dataset.

摘要

从语音中自动识别情感是一个具有挑战性的问题,高效建模长时时间上下文是其中的一个关键挑战。此外,在将特征之间的长期时间依赖关系纳入考虑时,通常默认使用递归神经网络(RNN)架构。在这项工作中,我们旨在提出一种有效的深度学习神经网络架构,该架构结合了连接时间分类(CTC)损失,用于离散语音情感识别(SER)。此外,我们还展示了通过在建模上下文信息时利用卷积神经网络(CNN)的特性,进一步提高 SER 性能的机会。我们提出的模型使用并行卷积层(PCN)与挤压激励网络(SEnet)集成,该系统在此表示为 PCNSE,从 3D 时频谱图中提取时间步和频率上的关系;这里,我们使用对数梅尔频谱图以及其一阶和二阶差分作为输入。此外,还使用带有 CTC 的自注意残差扩张网络(SADRN)作为 SER 的分类块。据作者所知,这是首次将这种混合架构应用于离散 SER。我们还在交互情感对偶运动捕捉(IEMOCAP)和 FAU-Aibo 情感语料库(FAU-AEC)上展示了我们提出的方法的有效性。实验结果表明,该方法非常适合离散 SER 任务,在 IEMOCAP 上的加权准确率(WA)为 73.1%,未加权准确率(UA)为 66.3%,在 FAU-AEC 数据集上的 UA 为 41.1%。

相似文献

1
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.
2
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
3
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
4
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
5
BAT: Block and token self-attention for speech emotion recognition.BAT:用于语音情感识别的块和令牌自注意力。
Neural Netw. 2022 Dec;156:67-80. doi: 10.1016/j.neunet.2022.09.022. Epub 2022 Sep 29.
6
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
7
A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition.一种用于语音情感识别的混合时间分布深度神经架构。
Int J Neural Syst. 2022 Jun;32(6):2250024. doi: 10.1142/S0129065722500241. Epub 2022 May 12.
8
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
9
Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition.基于注意力机制的预训练深度卷积神经网络语音情感识别模型
Front Physiol. 2021 Mar 2;12:643202. doi: 10.3389/fphys.2021.643202. eCollection 2021.
10
Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.用于噪声环境下语音情感识别的级联卷积神经网络架构
Sensors (Basel). 2021 Jun 27;21(13):4399. doi: 10.3390/s21134399.

引用本文的文献

1
Multi-scale fusion visual attention network for facial micro-expression recognition.用于面部微表情识别的多尺度融合视觉注意力网络。
Front Neurosci. 2023 Jul 27;17:1216181. doi: 10.3389/fnins.2023.1216181. eCollection 2023.
2
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
3
A new approach to COVID-19 data mining: A deep spatial-temporal prediction model based on tree structure for traffic revitalization index.
一种新型的新冠疫情数据挖掘方法:基于树形结构的交通复苏指数深度时空预测模型
Data Knowl Eng. 2023 Jul;146:102193. doi: 10.1016/j.datak.2023.102193. Epub 2023 May 16.
4
Human-Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention.基于集成技术 1D 卷积神经网络和注意力的实时语音情感识别的人机交互
Sensors (Basel). 2023 Jan 26;23(3):1386. doi: 10.3390/s23031386.
5
Bidirectional parallel echo state network for speech emotion recognition.用于语音情感识别的双向并行回声状态网络。
Neural Comput Appl. 2022;34(20):17581-17599. doi: 10.1007/s00521-022-07410-2. Epub 2022 May 31.
6
Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks.使用卷积神经网络检测说话人情绪的人机交互
Comput Intell Neurosci. 2022 Mar 31;2022:7463091. doi: 10.1155/2022/7463091. eCollection 2022.
7
Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier.基于多层感知器分类器的语音情感识别的人机交互。
J Healthc Eng. 2022 Mar 28;2022:6005446. doi: 10.1155/2022/6005446. eCollection 2022.
8
Streamflow prediction using an integrated methodology based on convolutional neural network and long short-term memory networks.基于卷积神经网络和长短时记忆网络的集成方法进行流域径流预测。
Sci Rep. 2021 Sep 1;11(1):17497. doi: 10.1038/s41598-021-96751-4.