• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于注意力机制的预训练深度卷积神经网络语音情感识别模型

Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition.

作者信息

Zhang Hua, Gou Ruoyun, Shang Jili, Shen Fangyao, Wu Yifan, Dai Guojun

机构信息

School of Computer Science and Technology, HangZhou Dianzi University, Hangzhou, China.

Key Laboratory of Network Multimedia Technology of Zhejiang Province, Zhejiang University, Hangzhou, China.

出版信息

Front Physiol. 2021 Mar 2;12:643202. doi: 10.3389/fphys.2021.643202. eCollection 2021.

DOI:10.3389/fphys.2021.643202
PMID:33737889
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7962985/
Abstract

Speech emotion recognition (SER) is a difficult and challenging task because of the affective variances between different speakers. The performances of SER are extremely reliant on the extracted features from speech signals. To establish an effective features extracting and classification model is still a challenging task. In this paper, we propose a new method for SER based on Deep Convolution Neural Network (DCNN) and Bidirectional Long Short-Term Memory with Attention (BLSTMwA) model (DCNN-BLSTMwA). We first preprocess the speech samples by data enhancement and datasets balancing. Secondly, we extract three-channel of log Mel-spectrograms (static, delta, and delta-delta) as DCNN input. Then the DCNN model pre-trained on ImageNet dataset is applied to generate the segment-level features. We stack these features of a sentence into utterance-level features. Next, we adopt BLSTM to learn the high-level emotional features for temporal summarization, followed by an attention layer which can focus on emotionally relevant features. Finally, the learned high-level emotional features are fed into the Deep Neural Network (DNN) to predict the final emotion. Experiments on EMO-DB and IEMOCAP database obtain the unweighted average recall (UAR) of 87.86 and 68.50%, respectively, which are better than most popular SER methods and demonstrate the effectiveness of our propose method.

摘要

语音情感识别(SER)是一项困难且具有挑战性的任务,因为不同说话者之间存在情感差异。SER的性能极大地依赖于从语音信号中提取的特征。建立一个有效的特征提取和分类模型仍然是一项具有挑战性的任务。在本文中,我们提出了一种基于深度卷积神经网络(DCNN)和带注意力的双向长短期记忆(BLSTMwA)模型(DCNN-BLSTMwA)的SER新方法。我们首先通过数据增强和数据集平衡对语音样本进行预处理。其次,我们提取三通道的对数梅尔频谱图(静态、一阶差分和二阶差分)作为DCNN的输入。然后,应用在ImageNet数据集上预训练的DCNN模型来生成片段级特征。我们将一个句子的这些特征堆叠成语句级特征。接下来,我们采用BLSTM来学习用于时间汇总的高级情感特征,随后是一个可以关注情感相关特征的注意力层。最后,将学习到的高级情感特征输入到深度神经网络(DNN)中以预测最终情感。在EMO-DB和IEMOCAP数据库上的实验分别获得了87.86%和68.50%的无加权平均召回率(UAR),这优于大多数流行的SER方法,并证明了我们提出的方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b184/7962985/5d24ecfa3ebc/fphys-12-643202-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b184/7962985/b603001ecaf2/fphys-12-643202-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b184/7962985/92bd02c25e8b/fphys-12-643202-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b184/7962985/f6123bd7c3cd/fphys-12-643202-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b184/7962985/5d24ecfa3ebc/fphys-12-643202-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b184/7962985/b603001ecaf2/fphys-12-643202-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b184/7962985/92bd02c25e8b/fphys-12-643202-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b184/7962985/f6123bd7c3cd/fphys-12-643202-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b184/7962985/5d24ecfa3ebc/fphys-12-643202-g0008.jpg

相似文献

1
Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition.基于注意力机制的预训练深度卷积神经网络语音情感识别模型
Front Physiol. 2021 Mar 2;12:643202. doi: 10.3389/fphys.2021.643202. eCollection 2021.
2
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
3
Effect on speech emotion classification of a feature selection approach using a convolutional neural network.使用卷积神经网络的特征选择方法对语音情感分类的影响。
PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.
4
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.
5
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
6
Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别
Front Psychol. 2023 Jan 9;13:1075624. doi: 10.3389/fpsyg.2022.1075624. eCollection 2022.
7
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
8
Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms.基于语音声谱图的卷积神经网络与特别设计的多注意力模块的年龄与性别识别
Sensors (Basel). 2021 Sep 1;21(17):5892. doi: 10.3390/s21175892.
9
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
10
Fusing traditionally extracted features with deep learned features from the speech spectrogram for anger and stress detection using convolution neural network.将传统提取的特征与来自语音频谱图的深度学习特征相融合,用于使用卷积神经网络进行愤怒和压力检测。
Multimed Tools Appl. 2022;81(21):31107-31128. doi: 10.1007/s11042-022-12886-0. Epub 2022 Apr 8.

引用本文的文献

1
Speech emotion classification using attention based network and regularized feature selection.基于注意力网络和正则化特征选择的语音情感分类。
Sci Rep. 2023 Jul 25;13(1):11990. doi: 10.1038/s41598-023-38868-2.
2
Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks.使用卷积神经网络检测说话人情绪的人机交互
Comput Intell Neurosci. 2022 Mar 31;2022:7463091. doi: 10.1155/2022/7463091. eCollection 2022.
3
Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier.
基于多层感知器分类器的语音情感识别的人机交互。
J Healthc Eng. 2022 Mar 28;2022:6005446. doi: 10.1155/2022/6005446. eCollection 2022.