Suppr
超能文献

基于深度学习的语音表达多模态融合情感识别方法

Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning.

作者信息

Liu Dong, Wang Zhiyong, Wang Lifeng, Chen Longxi

机构信息

School of Information Engineering, Shandong Youth University of Political Science, Jinan, China.

出版信息

Front Neurorobot. 2021 Jul 9;15:697634. doi: 10.3389/fnbot.2021.697634. eCollection 2021.

DOI:10.3389/fnbot.2021.697634

PMID:34305565

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8300695/

Abstract

The redundant information, noise data generated in the process of single-modal feature extraction, and traditional learning algorithms are difficult to obtain ideal recognition performance. A multi-modal fusion emotion recognition method for speech expressions based on deep learning is proposed. Firstly, the corresponding feature extraction methods are set up for different single modalities. Among them, the voice uses the convolutional neural network-long and short term memory (CNN-LSTM) network, and the facial expression in the video uses the Inception-Res Net-v2 network to extract the feature data. Then, long and short term memory (LSTM) is used to capture the correlation between different modalities and within the modalities. After the feature selection process of the chi-square test, the single modalities are spliced to obtain a unified fusion feature. Finally, the fusion data features output by LSTM are used as the input of the classifier LIBSVM to realize the final emotion recognition. The experimental results show that the recognition accuracy of the proposed method on the MOSI and MELD datasets are 87.56 and 90.06%, respectively, which are better than other comparison methods. It has laid a certain theoretical foundation for the application of multimodal fusion in emotion recognition.

摘要

单模态特征提取过程中产生的冗余信息、噪声数据以及传统学习算法难以获得理想的识别性能。提出了一种基于深度学习的语音表情多模态融合情感识别方法。首先，针对不同的单模态设置相应的特征提取方法。其中，语音采用卷积神经网络-长短时记忆（CNN-LSTM）网络，视频中的面部表情采用Inception-Res Net-v2网络提取特征数据。然后，使用长短时记忆（LSTM）来捕捉不同模态之间以及模态内部的相关性。经过卡方检验的特征选择过程后，将单模态进行拼接以获得统一的融合特征。最后，将LSTM输出的融合数据特征作为分类器LIBSVM的输入，实现最终的情感识别。实验结果表明，该方法在MOSI和MELD数据集上的识别准确率分别为87.56%和90.06%，优于其他对比方法。为多模态融合在情感识别中的应用奠定了一定的理论基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b80/8300695/d7647490c625/fnbot-15-697634-g0001.jpg

相似文献

Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning.

Front Neurorobot. 2021 Jul 9;15:697634. doi: 10.3389/fnbot.2021.697634. eCollection 2021.

Multimodal Feature Fusion Method for Unbalanced Sample Data in Social Network Public Opinion.

Sensors (Basel). 2022 Jul 25;22(15):5528. doi: 10.3390/s22155528.

Research on cross-modal emotion recognition based on multi-layer semantic fusion.

Math Biosci Eng. 2024 Jan 17;21(2):2488-2514. doi: 10.3934/mbe.2024110.

An Investigation of Deep Learning Models for EEG-Based Emotion Recognition.

Front Neurosci. 2020 Dec 23;14:622759. doi: 10.3389/fnins.2020.622759. eCollection 2020.

Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG.

IEEE Open J Eng Med Biol. 2023 Jan 27;5:396-403. doi: 10.1109/OJEMB.2023.3240280. eCollection 2024.

GCF-Net: global-aware cross-modal feature fusion network for speech emotion recognition.

Front Neurosci. 2023 May 4;17:1183132. doi: 10.3389/fnins.2023.1183132. eCollection 2023.

Expression-EEG Bimodal Fusion Emotion Recognition Method Based on Deep Learning.

Comput Math Methods Med. 2021 May 25;2021:9940148. doi: 10.1155/2021/9940148. eCollection 2021.

Multimode Gesture Recognition Algorithm Based on Convolutional Long Short-Term Memory Network.

Comput Intell Neurosci. 2022 Mar 2;2022:4068414. doi: 10.1155/2022/4068414. eCollection 2022.

Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy.

Neural Netw. 2018 Sep;105:36-51. doi: 10.1016/j.neunet.2017.11.021. Epub 2017 Dec 7.

Expression EEG Multimodal Emotion Recognition Method Based on the Bidirectional LSTM and Attention Mechanism.

Comput Math Methods Med. 2021 May 11;2021:9967592. doi: 10.1155/2021/9967592. eCollection 2021.

引用本文的文献

A review of multimodal deep learning methods for genomic-enabled prediction in plant breeding.

Genetics. 2024 Nov 5;228(4). doi: 10.1093/genetics/iyae161.

Integrating multi-modal remote sensing, deep learning, and attention mechanisms for yield prediction in plant breeding experiments.

Front Plant Sci. 2024 Jul 25;15:1408047. doi: 10.3389/fpls.2024.1408047. eCollection 2024.

Investigation of frequency components embedded in EEG recordings underlying neuronal mechanism of cognitive control and attentional functions.

Cogn Neurodyn. 2023 Oct;17(5):1321-1344. doi: 10.1007/s11571-022-09888-x. Epub 2022 Oct 21.

A Bimodal Emotion Recognition Approach through the Fusion of Electroencephalography and Facial Sequences.

Diagnostics (Basel). 2023 Mar 4;13(5):977. doi: 10.3390/diagnostics13050977.

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices.

Sensors (Basel). 2023 Feb 17;23(4):2284. doi: 10.3390/s23042284.

本文引用的文献

ARBEE: Towards Automated Recognition of Bodily Expression of Emotion in the Wild.

Int J Comput Vis. 2020 Jan;128(1):1-25. doi: 10.1007/s11263-019-01215-y. Epub 2019 Aug 31.

Joint Local and Global Information Learning With Single Apex Frame Detection for Micro-Expression Recognition.

IEEE Trans Image Process. 2021;30:249-263. doi: 10.1109/TIP.2020.3035042. Epub 2020 Nov 18.

Pattern Recognition Receptors: Significance of Expression in the Liver.

Arch Immunol Ther Exp (Warsz). 2020 Sep 17;68(5):29. doi: 10.1007/s00005-020-00595-1.

Modeling human age-associated increase in Gadd45γ expression leads to spatial recognition memory impairments in young adult mice.

Neurobiol Aging. 2020 Oct;94:281-286. doi: 10.1016/j.neurobiolaging.2020.06.021. Epub 2020 Jul 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

基于深度学习的语音表达多模态融合情感识别方法

Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译