Suppr超能文献

用于噪声环境下语音情感识别的级联卷积神经网络架构

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.

作者信息

Nam Youngja, Lee Chankyu

机构信息

Humanities Research Institute, Chung-Ang University, Seoul 06974, Korea.

Department of Korean Language and Literature, Chung-Ang University, Seoul 06974, Korea.

出版信息

Sensors (Basel). 2021 Jun 27;21(13):4399. doi: 10.3390/s21134399.

Abstract

Convolutional neural networks (CNNs) are a state-of-the-art technique for speech emotion recognition. However, CNNs have mostly been applied to noise-free emotional speech data, and limited evidence is available for their applicability in emotional speech denoising. In this study, a cascaded denoising CNN (DnCNN)-CNN architecture is proposed to classify emotions from Korean and German speech in noisy conditions. The proposed architecture consists of two stages. In the first stage, the DnCNN exploits the concept of residual learning to perform denoising; in the second stage, the CNN performs the classification. The classification results for real datasets show that the DnCNN-CNN outperforms the baseline CNN in overall accuracy for both languages. For Korean speech, the DnCNN-CNN achieves an accuracy of 95.8%, whereas the accuracy of the CNN is marginally lower (93.6%). For German speech, the DnCNN-CNN has an overall accuracy of 59.3-76.6%, whereas the CNN has an overall accuracy of 39.4-58.1%. These results demonstrate the feasibility of applying the DnCNN with residual learning to speech denoising and the effectiveness of the CNN-based approach in speech emotion recognition. Our findings provide new insights into speech emotion recognition in adverse conditions and have implications for language-universal speech emotion recognition.

摘要

卷积神经网络(CNNs)是一种用于语音情感识别的先进技术。然而,卷积神经网络大多应用于无噪声的情感语音数据,其在情感语音去噪方面的适用性证据有限。在本研究中,提出了一种级联去噪卷积神经网络(DnCNN)-卷积神经网络架构,用于在有噪声条件下对韩语和德语语音的情感进行分类。所提出的架构由两个阶段组成。在第一阶段,DnCNN利用残差学习的概念进行去噪;在第二阶段,卷积神经网络进行分类。真实数据集的分类结果表明,DnCNN-卷积神经网络在两种语言的总体准确率方面均优于基线卷积神经网络。对于韩语语音,DnCNN-卷积神经网络的准确率达到95.8%,而卷积神经网络的准确率略低(93.6%)。对于德语语音,DnCNN-卷积神经网络的总体准确率为59.3 - 76.6%,而卷积神经网络的总体准确率为39.4 - 58.1%。这些结果证明了将具有残差学习的DnCNN应用于语音去噪的可行性以及基于卷积神经网络的方法在语音情感识别中的有效性。我们的研究结果为不利条件下的语音情感识别提供了新的见解,并对语言通用的语音情感识别具有启示意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6113/8271804/35d50eaea614/sensors-21-04399-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验