• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.

机构信息

Interaction Technology Laboratory, Department of Software, Sejong University, Seoul 05006, Korea.

出版信息

Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.

DOI:10.3390/s20185212
PMID:32932723
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7570673/
Abstract

Artificial intelligence (AI) and machine learning (ML) are employed to make systems smarter. Today, the speech emotion recognition (SER) system evaluates the emotional state of the speaker by investigating his/her speech signal. Emotion recognition is a challenging task for a machine. In addition, making it smarter so that the emotions are efficiently recognized by AI is equally challenging. The speech signal is quite hard to examine using signal processing methods because it consists of different frequencies and features that vary according to emotions, such as anger, fear, sadness, happiness, boredom, disgust, and surprise. Even though different algorithms are being developed for the SER, the success rates are very low according to the languages, the emotions, and the databases. In this paper, we propose a new lightweight effective SER model that has a low computational complexity and a high recognition accuracy. The suggested method uses the convolutional neural network (CNN) approach to learn the deep frequency features by using a plain rectangular filter with a modified pooling strategy that have more discriminative power for the SER. The proposed CNN model was trained on the extracted frequency features from the speech data and was then tested to predict the emotions. The proposed SER model was evaluated over two benchmarks, which included the interactive emotional dyadic motion capture (IEMOCAP) and the berlin emotional speech database (EMO-DB) speech datasets, and it obtained 77.01% and 92.02% recognition results. The experimental results demonstrated that the proposed CNN-based SER system can achieve a better recognition performance than the state-of-the-art SER systems.

摘要

人工智能(AI)和机器学习(ML)被用于使系统更加智能化。如今,语音情感识别(SER)系统通过研究说话者的语音信号来评估其情绪状态。情感识别对于机器来说是一项具有挑战性的任务。此外,要使系统变得更加智能,以便 AI 能够有效地识别情感,这同样具有挑战性。由于语音信号包含不同的频率和特征,这些特征根据情感(如愤怒、恐惧、悲伤、快乐、无聊、厌恶和惊讶)而变化,因此使用信号处理方法很难对其进行检查。尽管针对 SER 开发了不同的算法,但根据语言、情感和数据库,成功率非常低。在本文中,我们提出了一种新的轻量级有效 SER 模型,该模型具有较低的计算复杂度和较高的识别精度。所提出的方法使用卷积神经网络(CNN)方法通过使用具有修改后的池化策略的普通矩形滤波器来学习深度频率特征,该策略对 SER 具有更强的区分能力。所提出的 CNN 模型在从语音数据中提取的频率特征上进行了训练,然后进行了测试以预测情绪。在所提出的 SER 模型的评估中,使用了两个基准数据集,即交互情感对偶运动捕捉(IEMOCAP)和柏林情感语音数据库(EMO-DB)语音数据集,分别获得了 77.01%和 92.02%的识别结果。实验结果表明,所提出的基于 CNN 的 SER 系统可以实现比现有 SER 系统更好的识别性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/edb03e000c2c/sensors-20-05212-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/bc178e1266f6/sensors-20-05212-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/a37f9a329426/sensors-20-05212-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/bb93f628b903/sensors-20-05212-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/d2acf4a5c638/sensors-20-05212-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/edb03e000c2c/sensors-20-05212-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/bc178e1266f6/sensors-20-05212-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/a37f9a329426/sensors-20-05212-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/bb93f628b903/sensors-20-05212-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/d2acf4a5c638/sensors-20-05212-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e413/7570673/edb03e000c2c/sensors-20-05212-g005.jpg

相似文献

1
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
2
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
3
A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.基于 CNN 的增强型音频信号处理在语音情感识别中的应用。
Sensors (Basel). 2019 Dec 28;20(1):183. doi: 10.3390/s20010183.
4
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
5
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.
6
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients.物联网支持的 WBAN 和机器学习在患者语音情感识别中的应用。
Sensors (Basel). 2023 Mar 8;23(6):2948. doi: 10.3390/s23062948.
7
A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition.一种用于语音情感识别的混合时间分布深度神经架构。
Int J Neural Syst. 2022 Jun;32(6):2250024. doi: 10.1142/S0129065722500241. Epub 2022 May 12.
8
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
9
DSTCNet: Deep Spectro-Temporal-Channel Attention Network for Speech Emotion Recognition.DSTCNet:用于语音情感识别的深度频谱-时间-通道注意力网络
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):188-197. doi: 10.1109/TNNLS.2023.3304516. Epub 2025 Jan 7.
10
Speech emotion analysis using convolutional neural network (CNN) and gamma classifier-based error correcting output codes (ECOC).基于卷积神经网络 (CNN) 和基于 Gamma 分类器的纠错输出码 (ECOC) 的语音情感分析。
Sci Rep. 2023 Nov 21;13(1):20398. doi: 10.1038/s41598-023-47118-4.

引用本文的文献

1
Convolutional neural network in rice disease recognition: accuracy, speed and lightweight.卷积神经网络在水稻病害识别中的应用:准确性、速度与轻量级
Front Plant Sci. 2023 Nov 1;14:1269371. doi: 10.3389/fpls.2023.1269371. eCollection 2023.
2
Speech emotion classification using attention based network and regularized feature selection.基于注意力网络和正则化特征选择的语音情感分类。
Sci Rep. 2023 Jul 25;13(1):11990. doi: 10.1038/s41598-023-38868-2.
3
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.

本文引用的文献

1
Image Encryption Based on Pixel-Level Diffusion with Dynamic Filtering and DNA-Level Permutation with 3D Latin Cubes.基于动态滤波的像素级扩散和三维拉丁立方体的DNA级排列的图像加密
Entropy (Basel). 2019 Mar 24;21(3):319. doi: 10.3390/e21030319.
2
A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.基于 CNN 的增强型音频信号处理在语音情感识别中的应用。
Sensors (Basel). 2019 Dec 28;20(1):183. doi: 10.3390/s20010183.
3
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.
基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
4
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients.物联网支持的 WBAN 和机器学习在患者语音情感识别中的应用。
Sensors (Basel). 2023 Mar 8;23(6):2948. doi: 10.3390/s23062948.
5
A Bimodal Emotion Recognition Approach through the Fusion of Electroencephalography and Facial Sequences.一种通过融合脑电图和面部序列的双峰情感识别方法。
Diagnostics (Basel). 2023 Mar 4;13(5):977. doi: 10.3390/diagnostics13050977.
6
Human-Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention.基于集成技术 1D 卷积神经网络和注意力的实时语音情感识别的人机交互
Sensors (Basel). 2023 Jan 26;23(3):1386. doi: 10.3390/s23031386.
7
Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别
Front Psychol. 2023 Jan 9;13:1075624. doi: 10.3389/fpsyg.2022.1075624. eCollection 2022.
8
Establishment and psychometric characteristics of emotional words list for suicidal risk assessment in speech emotion recognition.用于语音情感识别中自杀风险评估的情感词汇表的建立及心理测量特征
Front Psychiatry. 2022 Nov 11;13:1022036. doi: 10.3389/fpsyt.2022.1022036. eCollection 2022.
9
M1M2: Deep-Learning-Based Real-Time Emotion Recognition from Neural Activity.M1M2:基于深度学习的神经活动实时情绪识别。
Sensors (Basel). 2022 Nov 3;22(21):8467. doi: 10.3390/s22218467.
10
Iris Recognition Method Based on Parallel Iris Localization Algorithm and Deep Learning Iris Verification.基于并行虹膜定位算法和深度学习虹膜验证的虹膜识别方法。
Sensors (Basel). 2022 Oct 12;22(20):7723. doi: 10.3390/s22207723.
马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
4
Exploiting Unlabeled Data in CNNs by Self-Supervised Learning to Rank.通过自监督学习排序在卷积神经网络中利用未标记数据。
IEEE Trans Pattern Anal Mach Intell. 2019 Aug;41(8):1862-1878. doi: 10.1109/TPAMI.2019.2899857. Epub 2019 Feb 15.
5
A Review of Emotion Recognition Using Physiological Signals.基于生理信号的情感识别研究综述。
Sensors (Basel). 2018 Jun 28;18(7):2074. doi: 10.3390/s18072074.
6
Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.基于 SVM 和 DBN 组合的智能情感服务中的汉语语音情感识别。
Sensors (Basel). 2017 Jul 24;17(7):1694. doi: 10.3390/s17071694.
7
Evaluating deep learning architectures for Speech Emotion Recognition.评估用于语音情感识别的深度学习架构。
Neural Netw. 2017 Aug;92:60-68. doi: 10.1016/j.neunet.2017.02.013. Epub 2017 Mar 21.
8
Face recognition: a convolutional neural-network approach.人脸识别:一种卷积神经网络方法。
IEEE Trans Neural Netw. 1997;8(1):98-113. doi: 10.1109/72.554195.