• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度卷积神经网络的特征选择算法对语音情感识别的影响。

Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.

机构信息

Department of Computer Engineering, University of Engineering and Technology, Taxila 47050, Pakistan.

Department of Software Engineering, University of Engineering and Technology, Taxila 47050, Pakistan.

出版信息

Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.

DOI:10.3390/s20216008
PMID:33113907
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7660211/
Abstract

Speech emotion recognition (SER) plays a significant role in human-machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests, the k-nearest neighbors algorithm, and neural network classifiers. Experiments are performed for speaker-dependent and speaker-independent SER using four publicly available datasets: the Berlin Dataset of Emotional Speech (Emo-DB), Surrey Audio Visual Expressed Emotion (SAVEE), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and the Ryerson Audio Visual Dataset of Emotional Speech and Song (RAVDESS). Our proposed method achieves an accuracy of 95.10% for Emo-DB, 82.10% for SAVEE, 83.80% for IEMOCAP, and 81.30% for RAVDESS, for speaker-dependent SER experiments. Moreover, our method yields the best results for speaker-independent SER with existing handcrafted features-based SER approaches.

摘要

语音情感识别(SER)在人机交互中起着重要作用。由于机器无法理解上下文,因此从语音中识别情感并进行精确分类是一项具有挑战性的任务。为了进行准确的情感分类,必须从语音数据中提取与情感相关的特征。传统上,从语音信号中进行情感分类使用手工制作的特征,但它们的效率不足以准确描述说话者的情感状态。在这项研究中,探讨了深度卷积神经网络(DCNN)在 SER 中的优势。为此,使用预训练网络从最先进的语音情感数据集提取特征。随后,应用基于相关性的特征选择技术对提取的特征进行选择,以选择最适合和最具区分度的特征用于 SER。对于情感分类,我们使用支持向量机、随机森林、k-最近邻算法和神经网络分类器。使用四个公开可用的数据集(柏林情感语音数据集(Emo-DB)、萨里视听表达情感数据集(SAVEE)、交互情感对偶运动捕捉数据集(IEMOCAP)和 Ryerson 视听情感语音和歌曲数据集(RAVDESS))进行了说话人相关和说话人无关的 SER 实验。我们提出的方法在说话人相关的 SER 实验中分别达到了 Emo-DB 数据集 95.10%、SAVEE 数据集 82.10%、IEMOCAP 数据集 83.80%和 RAVDESS 数据集 81.30%的准确率。此外,我们的方法在基于现有手工制作特征的说话人无关 SER 方法中取得了最佳的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/4f1a2520766b/sensors-20-06008-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/de57d304989a/sensors-20-06008-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/85ed89afae8b/sensors-20-06008-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/830aaf5f351a/sensors-20-06008-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/c6608aac6339/sensors-20-06008-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/3256d362d807/sensors-20-06008-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/596c18fe814a/sensors-20-06008-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/f1e31cc5e4bb/sensors-20-06008-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/546e140d1632/sensors-20-06008-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/4f1a2520766b/sensors-20-06008-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/de57d304989a/sensors-20-06008-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/85ed89afae8b/sensors-20-06008-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/830aaf5f351a/sensors-20-06008-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/c6608aac6339/sensors-20-06008-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/3256d362d807/sensors-20-06008-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/596c18fe814a/sensors-20-06008-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/f1e31cc5e4bb/sensors-20-06008-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/546e140d1632/sensors-20-06008-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b122/7660211/4f1a2520766b/sensors-20-06008-g009.jpg

相似文献

1
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
2
Effect on speech emotion classification of a feature selection approach using a convolutional neural network.使用卷积神经网络的特征选择方法对语音情感分类的影响。
PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.
3
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
4
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
5
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
6
A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.基于 CNN 的增强型音频信号处理在语音情感识别中的应用。
Sensors (Basel). 2019 Dec 28;20(1):183. doi: 10.3390/s20010183.
7
Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition.融合卷积-BERT:语音情感识别的并行卷积和 BERT 融合。
Sensors (Basel). 2020 Nov 23;20(22):6688. doi: 10.3390/s20226688.
8
Fusing traditionally extracted features with deep learned features from the speech spectrogram for anger and stress detection using convolution neural network.将传统提取的特征与来自语音频谱图的深度学习特征相融合,用于使用卷积神经网络进行愤怒和压力检测。
Multimed Tools Appl. 2022;81(21):31107-31128. doi: 10.1007/s11042-022-12886-0. Epub 2022 Apr 8.
9
Feature selection enhancement and feature space visualization for speech-based emotion recognition.基于语音的情感识别的特征选择增强与特征空间可视化
PeerJ Comput Sci. 2022 Nov 4;8:e1091. doi: 10.7717/peerj-cs.1091. eCollection 2022.
10
Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition.基于注意力机制的预训练深度卷积神经网络语音情感识别模型
Front Physiol. 2021 Mar 2;12:643202. doi: 10.3389/fphys.2021.643202. eCollection 2021.

引用本文的文献

1
Speech emotion classification using attention based network and regularized feature selection.基于注意力网络和正则化特征选择的语音情感分类。
Sci Rep. 2023 Jul 25;13(1):11990. doi: 10.1038/s41598-023-38868-2.
2
Human-Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention.基于集成技术 1D 卷积神经网络和注意力的实时语音情感识别的人机交互
Sensors (Basel). 2023 Jan 26;23(3):1386. doi: 10.3390/s23031386.
3
Establishment and psychometric characteristics of emotional words list for suicidal risk assessment in speech emotion recognition.

本文引用的文献

1
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English.瑞尔森情感语音和歌曲音频视频数据库(RAVDESS):一组具有北美英语特色的动态、多模态面部和声音表情数据集。
PLoS One. 2018 May 16;13(5):e0196391. doi: 10.1371/journal.pone.0196391. eCollection 2018.
2
Evaluating deep learning architectures for Speech Emotion Recognition.评估用于语音情感识别的深度学习架构。
Neural Netw. 2017 Aug;92:60-68. doi: 10.1016/j.neunet.2017.02.013. Epub 2017 Mar 21.
用于语音情感识别中自杀风险评估的情感词汇表的建立及心理测量特征
Front Psychiatry. 2022 Nov 11;13:1022036. doi: 10.3389/fpsyt.2022.1022036. eCollection 2022.
4
Speech Emotion Recognition Based on Modified ReliefF.基于改进 ReliefF 的语音情感识别。
Sensors (Basel). 2022 Oct 25;22(21):8152. doi: 10.3390/s22218152.
5
Vector learning representation for generalized speech emotion recognition.用于广义语音情感识别的向量学习表示。
Heliyon. 2022 Mar 28;8(3):e09196. doi: 10.1016/j.heliyon.2022.e09196. eCollection 2022 Mar.
6
Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks.使用卷积神经网络检测说话人情绪的人机交互
Comput Intell Neurosci. 2022 Mar 31;2022:7463091. doi: 10.1155/2022/7463091. eCollection 2022.
7
Emotional Speech Recognition Using Deep Neural Networks.使用深度神经网络进行情感语音识别。
Sensors (Basel). 2022 Feb 12;22(4):1414. doi: 10.3390/s22041414.
8
The Impact of Attention Mechanisms on Speech Emotion Recognition.注意力机制对语音情感识别的影响。
Sensors (Basel). 2021 Nov 12;21(22):7530. doi: 10.3390/s21227530.
9
Incorporating Interpersonal Synchronization Features for Automatic Emotion Recognition from Visual and Audio Data during Communication.将人际同步特征纳入视觉和音频数据的自动情感识别中,用于交流期间。
Sensors (Basel). 2021 Aug 6;21(16):5317. doi: 10.3390/s21165317.
10
Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition.基于深度度量学习的话语级特征聚合在语音情感识别中的研究
Sensors (Basel). 2021 Jun 20;21(12):4233. doi: 10.3390/s21124233.