用于噪声环境下语音情感识别的级联卷积神经网络架构

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.

作者信息

Nam Youngja, Lee Chankyu

机构信息

Humanities Research Institute, Chung-Ang University, Seoul 06974, Korea.

Department of Korean Language and Literature, Chung-Ang University, Seoul 06974, Korea.

出版信息

Sensors (Basel). 2021 Jun 27;21(13):4399. doi: 10.3390/s21134399.

DOI:10.3390/s21134399

PMID:34199027

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8271804/

Abstract

Convolutional neural networks (CNNs) are a state-of-the-art technique for speech emotion recognition. However, CNNs have mostly been applied to noise-free emotional speech data, and limited evidence is available for their applicability in emotional speech denoising. In this study, a cascaded denoising CNN (DnCNN)-CNN architecture is proposed to classify emotions from Korean and German speech in noisy conditions. The proposed architecture consists of two stages. In the first stage, the DnCNN exploits the concept of residual learning to perform denoising; in the second stage, the CNN performs the classification. The classification results for real datasets show that the DnCNN-CNN outperforms the baseline CNN in overall accuracy for both languages. For Korean speech, the DnCNN-CNN achieves an accuracy of 95.8%, whereas the accuracy of the CNN is marginally lower (93.6%). For German speech, the DnCNN-CNN has an overall accuracy of 59.3-76.6%, whereas the CNN has an overall accuracy of 39.4-58.1%. These results demonstrate the feasibility of applying the DnCNN with residual learning to speech denoising and the effectiveness of the CNN-based approach in speech emotion recognition. Our findings provide new insights into speech emotion recognition in adverse conditions and have implications for language-universal speech emotion recognition.

摘要

卷积神经网络（CNNs）是一种用于语音情感识别的先进技术。然而，卷积神经网络大多应用于无噪声的情感语音数据，其在情感语音去噪方面的适用性证据有限。在本研究中，提出了一种级联去噪卷积神经网络（DnCNN）-卷积神经网络架构，用于在有噪声条件下对韩语和德语语音的情感进行分类。所提出的架构由两个阶段组成。在第一阶段，DnCNN利用残差学习的概念进行去噪；在第二阶段，卷积神经网络进行分类。真实数据集的分类结果表明，DnCNN-卷积神经网络在两种语言的总体准确率方面均优于基线卷积神经网络。对于韩语语音，DnCNN-卷积神经网络的准确率达到95.8%，而卷积神经网络的准确率略低（93.6%）。对于德语语音，DnCNN-卷积神经网络的总体准确率为59.3 - 76.6%，而卷积神经网络的总体准确率为39.4 - 58.1%。这些结果证明了将具有残差学习的DnCNN应用于语音去噪的可行性以及基于卷积神经网络的方法在语音情感识别中的有效性。我们的研究结果为不利条件下的语音情感识别提供了新的见解，并对语言通用的语音情感识别具有启示意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6113/8271804/35d50eaea614/sensors-21-04399-g001.jpg

相似文献

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.用于噪声环境下语音情感识别的级联卷积神经网络架构

Sensors (Basel). 2021 Jun 27;21(13):4399. doi: 10.3390/s21134399.

A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。

PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.

Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中，将二维并行卷积神经网络与自注意力空洞残差网络相结合。

Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.

A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition.一种用于语音情感识别的混合时间分布深度神经架构。

Int J Neural Syst. 2022 Jun;32(6):2250024. doi: 10.1142/S0129065722500241. Epub 2022 May 12.

A convolutional neural network for ultra-low-dose CT denoising and emphysema screening.用于超低剂量 CT 去噪和肺气肿筛查的卷积神经网络。

Med Phys. 2019 Sep;46(9):3941-3950. doi: 10.1002/mp.13666. Epub 2019 Jul 17.

Introducing Swish and Parallelized Blind Removal Improves the Performance of a Convolutional Neural Network in Denoising MR Images.引入 Swish 和并行化盲去除可提高卷积神经网络在磁共振图像去噪中的性能。

Magn Reson Med Sci. 2021 Dec 1;20(4):410-424. doi: 10.2463/mrms.mp.2020-0073. Epub 2021 Feb 11.

Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别

Front Psychol. 2023 Jan 9;13:1075624. doi: 10.3389/fpsyg.2022.1075624. eCollection 2022.

Automated accurate emotion recognition system using rhythm-specific deep convolutional neural network technique with multi-channel EEG signals.基于多通道 EEG 信号的节律特定深度卷积神经网络技术的自动化精确情绪识别系统。

Comput Biol Med. 2021 Jul;134:104428. doi: 10.1016/j.compbiomed.2021.104428. Epub 2021 May 6.

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络：基于深度学习频率特征的轻量级 CNN 语音情感识别系统

Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.

Human-Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention.基于集成技术 1D 卷积神经网络和注意力的实时语音情感识别的人机交互

Sensors (Basel). 2023 Jan 26;23(3):1386. doi: 10.3390/s23031386.

引用本文的文献

Facial recognition and analysis: A machine learning-based pathway to corporate mental health management.面部识别与分析：一条基于机器学习的企业心理健康管理途径。

Digit Health. 2025 Apr 15;11:20552076251335542. doi: 10.1177/20552076251335542. eCollection 2025 Jan-Dec.

Birdsong classification based on ensemble multi-scale convolutional neural network.基于集成多尺度卷积神经网络的鸟鸣分类。

Sci Rep. 2022 May 23;12(1):8636. doi: 10.1038/s41598-022-12121-8.

BanglaSER: A speech emotion recognition dataset for the Bangla language.孟加拉语SER：一个用于孟加拉语的语音情感识别数据集。

Data Brief. 2022 Mar 22;42:108091. doi: 10.1016/j.dib.2022.108091. eCollection 2022 Jun.

Automated lung ultrasound scoring for evaluation of coronavirus disease 2019 pneumonia using two-stage cascaded deep learning model.使用两阶段级联深度学习模型的自动肺超声评分用于评估2019冠状病毒病肺炎

Biomed Signal Process Control. 2022 May;75:103561. doi: 10.1016/j.bspc.2022.103561. Epub 2022 Feb 7.

本文引用的文献

Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models.深度学习技术在语音情感识别中的应用，从数据库到模型。

Sensors (Basel). 2021 Feb 10;21(4):1249. doi: 10.3390/s21041249.

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络：基于深度学习频率特征的轻量级 CNN 语音情感识别系统

Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.

Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network.基于深度神经网络异构特征统一的语音情感识别

Sensors (Basel). 2019 Jun 18;19(12):2730. doi: 10.3390/s19122730.

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising.超越高斯去噪器：用于图像去噪的深度 CNN 的残差学习。

IEEE Trans Image Process. 2017 Jul;26(7):3142-3155. doi: 10.1109/TIP.2017.2662206. Epub 2017 Feb 1.

Deep learning.深度学习。

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

Representation learning: a review and new perspectives.表示学习：综述与新视角。

IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于噪声环境下语音情感识别的级联卷积神经网络架构

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献