基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别

Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.

作者信息

Sun Congshan, Li Haifeng, Ma Lin

机构信息

Faculty of Computing, Harbin Institute of Technology, Harbin, China.

出版信息

Front Psychol. 2023 Jan 9;13:1075624. doi: 10.3389/fpsyg.2022.1075624. eCollection 2022.

DOI:10.3389/fpsyg.2022.1075624

PMID:36698559

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9869168/

Abstract

Speech emotion recognition (SER) is the key to human-computer emotion interaction. However, the nonlinear characteristics of speech emotion are variable, complex, and subtly changing. Therefore, accurate recognition of emotions from speech remains a challenge. Empirical mode decomposition (EMD), as an effective decomposition method for nonlinear non-stationary signals, has been successfully used to analyze emotional speech signals. However, the mode mixing problem of EMD affects the performance of EMD-based methods for SER. Various improved methods for EMD have been proposed to alleviate the mode mixing problem. These improved methods still suffer from the problems of mode mixing, residual noise, and long computation time, and their main parameters cannot be set adaptively. To overcome these problems, we propose a novel SER framework, named IMEMD-CRNN, based on the combination of an improved version of the masking signal-based EMD (IMEMD) and convolutional recurrent neural network (CRNN). First, IMEMD is proposed to decompose speech. IMEMD is a novel disturbance-assisted EMD method and can determine the parameters of masking signals to the nature of signals. Second, we extract the 43-dimensional time-frequency features that can characterize the emotion from the intrinsic mode functions (IMFs) obtained by IMEMD. Finally, we input these features into a CRNN network to recognize emotions. In the CRNN, 2D convolutional neural networks (CNN) layers are used to capture nonlinear local temporal and frequency information of the emotional speech. Bidirectional gated recurrent units (BiGRU) layers are used to learn the temporal context information further. Experiments on the publicly available TESS dataset and Emo-DB dataset demonstrate the effectiveness of our proposed IMEMD-CRNN framework. The TESS dataset consists of 2,800 utterances containing seven emotions recorded by two native English speakers. The Emo-DB dataset consists of 535 utterances containing seven emotions recorded by ten native German speakers. The proposed IMEMD-CRNN framework achieves a state-of-the-art overall accuracy of 100% for the TESS dataset over seven emotions and 93.54% for the Emo-DB dataset over seven emotions. The IMEMD alleviates the mode mixing and obtains IMFs with less noise and more physical meaning with significantly improved efficiency. Our IMEMD-CRNN framework significantly improves the performance of emotion recognition.

摘要

语音情感识别（SER）是人机情感交互的关键。然而，语音情感的非线性特征具有多变性、复杂性和微妙的变化性。因此，从语音中准确识别情感仍然是一项挑战。经验模态分解（EMD）作为一种用于非线性非平稳信号的有效分解方法，已成功用于分析情感语音信号。然而，EMD的模态混叠问题影响了基于EMD的SER方法的性能。人们提出了各种EMD的改进方法来缓解模态混叠问题。这些改进方法仍然存在模态混叠、残余噪声和计算时间长的问题，并且其主要参数不能自适应设置。为了克服这些问题，我们提出了一种基于基于掩蔽信号的EMD改进版本（IMEMD）和卷积循环神经网络（CRNN）相结合的新型SER框架，称为IMEMD-CRNN。首先，提出IMEMD来分解语音。IMEMD是一种新型的干扰辅助EMD方法，可以根据信号的性质确定掩蔽信号的参数。其次，我们从IMEMD获得的本征模态函数（IMF）中提取能够表征情感的43维时频特征。最后，我们将这些特征输入到CRNN网络中进行情感识别。在CRNN中，二维卷积神经网络（CNN）层用于捕获情感语音的非线性局部时间和频率信息。双向门控循环单元（BiGRU）层用于进一步学习时间上下文信息。在公开可用的TESS数据集和Emo-DB数据集上进行的实验证明了我们提出的IMEMD-CRNN框架的有效性。TESS数据集由2800个话语组成，包含由两位以英语为母语的人录制的七种情感。Emo-DB数据集由535个话语组成，包含由十位以德语为母语的人录制的七种情感。所提出的IMEMD-CRNN框架在TESS数据集上对七种情感的总体准确率达到了100%的先进水平，在Emo-DB数据集上对七种情感的总体准确率达到了93.54%。IMEMD缓解了模态混叠，获得了噪声更少、物理意义更强的IMF，效率显著提高。我们的IMEMD-CRNN框架显著提高了情感识别的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9e1/9869168/c1521a8b8966/fpsyg-13-1075624-g001.jpg

相似文献

Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别

Front Psychol. 2023 Jan 9;13:1075624. doi: 10.3389/fpsyg.2022.1075624. eCollection 2022.

EEG-based emotion charting for Parkinson's disease patients using Convolutional Recurrent Neural Networks and cross dataset learning.基于 EEG 的帕金森病患者情绪图表分析，使用卷积循环神经网络和跨数据集学习。

Comput Biol Med. 2022 May;144:105327. doi: 10.1016/j.compbiomed.2022.105327. Epub 2022 Mar 11.

Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。

Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.

Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。

Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络：基于深度学习频率特征的轻量级 CNN 语音情感识别系统

Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.

Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中，将二维并行卷积神经网络与自注意力空洞残差网络相结合。

Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.

Attention-based 3D convolutional recurrent neural network model for multimodal emotion recognition.基于注意力的多模态情感识别三维卷积递归神经网络模型

Front Neurosci. 2024 Jan 10;17:1330077. doi: 10.3389/fnins.2023.1330077. eCollection 2023.

Convolutional Recurrent Neural Network-Based Event Detection in Tunnels Using Multiple Microphones.基于卷积递归神经网络的多麦克风隧道事件检测

Sensors (Basel). 2019 Jun 14;19(12):2695. doi: 10.3390/s19122695.

Spatial-frequency-temporal convolutional recurrent network for olfactory-enhanced EEG emotion recognition.基于空间频率-时间卷积循环网络的嗅觉增强脑电情感识别

J Neurosci Methods. 2022 Jul 1;376:109624. doi: 10.1016/j.jneumeth.2022.109624. Epub 2022 May 16.

A Music Emotion Classification Model Based on the Improved Convolutional Neural Network.基于改进卷积神经网络的音乐情绪分类模型。

Comput Intell Neurosci. 2022 Feb 14;2022:6749622. doi: 10.1155/2022/6749622. eCollection 2022.

引用本文的文献

Research on the Strong Generalization of Coal Gangue Recognition Technology Based on the Image and Convolutional Neural Network under Complex Conditions.复杂条件下基于图像与卷积神经网络的煤矸石识别技术强泛化性研究

ACS Omega. 2023 Oct 13;8(43):40309-40320. doi: 10.1021/acsomega.3c04558. eCollection 2023 Oct 31.

本文引用的文献

Judging the emotional states of customer service staff in the workplace: A multimodal dataset analysis.判断工作场所中客服人员的情绪状态：多模态数据集分析。

Front Psychol. 2022 Nov 11;13:1001885. doi: 10.3389/fpsyg.2022.1001885. eCollection 2022.

DEEMD-SPP: A Novel Framework for Emotion Recognition Based on EEG Signals.DEEMD-SPP：一种基于脑电信号的情感识别新框架。

Front Psychiatry. 2022 Apr 27;13:885120. doi: 10.3389/fpsyt.2022.885120. eCollection 2022.

Classification of Contrasting Discrete Emotional States Indicated by EEG Based Graph Theoretical Network Measures.基于 EEG 的图论网络测量的对比离散情绪状态分类。

Neuroinformatics. 2022 Oct;20(4):863-877. doi: 10.1007/s12021-022-09579-2. Epub 2022 Mar 14.

Improved Hilbert-Huang transform with soft sifting stopping criterion and its application to fault diagnosis of wheelset bearings.具有软筛分停止准则的改进型希尔伯特-黄变换及其在轮对轴承故障诊断中的应用

ISA Trans. 2022 Jun;125:426-444. doi: 10.1016/j.isatra.2021.07.011. Epub 2021 Jul 9.

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络：基于深度学习频率特征的轻量级 CNN 语音情感识别系统

Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.

Uniform Phase Empirical Mode Decomposition: An Optimal Hybridization of Masking Signal and Ensemble Approaches.均匀相位经验模态分解：掩蔽信号与集成方法的优化混合

IEEE Access. 2018;6:34819-34833. doi: 10.1109/ACCESS.2018.2847634. Epub 2018 Jun 15.

A novel signal processing approach to auditory phantom perception.一种新的听觉幻觉感知信号处理方法。

Psychon Bull Rev. 2019 Feb;26(1):250-260. doi: 10.3758/s13423-018-1513-y.

An accurate emotion recognition system using ECG and GSR signals and matching pursuit method.使用 ECG 和 GSR 信号及匹配追踪法的精确情绪识别系统。

Biomed J. 2017 Dec;40(6):355-368. doi: 10.1016/j.bj.2017.11.001. Epub 2018 Jan 3.

Effects of frontal transcranial direct current stimulation on emotional state and processing in healthy humans.额叶经颅直流电刺激对健康人情绪状态及加工过程的影响。

Front Psychiatry. 2012 Jun 18;3:58. doi: 10.3389/fpsyt.2012.00058. eCollection 2012.

Constants across cultures in the face and emotion.面部与情感方面的跨文化常量。

J Pers Soc Psychol. 1971 Feb;17(2):124-9. doi: 10.1037/h0030377.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别

Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献