Suppr超能文献

基于深度学习的语音表达多模态融合情感识别方法

Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning.

作者信息

Liu Dong, Wang Zhiyong, Wang Lifeng, Chen Longxi

机构信息

School of Information Engineering, Shandong Youth University of Political Science, Jinan, China.

出版信息

Front Neurorobot. 2021 Jul 9;15:697634. doi: 10.3389/fnbot.2021.697634. eCollection 2021.

Abstract

The redundant information, noise data generated in the process of single-modal feature extraction, and traditional learning algorithms are difficult to obtain ideal recognition performance. A multi-modal fusion emotion recognition method for speech expressions based on deep learning is proposed. Firstly, the corresponding feature extraction methods are set up for different single modalities. Among them, the voice uses the convolutional neural network-long and short term memory (CNN-LSTM) network, and the facial expression in the video uses the Inception-Res Net-v2 network to extract the feature data. Then, long and short term memory (LSTM) is used to capture the correlation between different modalities and within the modalities. After the feature selection process of the chi-square test, the single modalities are spliced to obtain a unified fusion feature. Finally, the fusion data features output by LSTM are used as the input of the classifier LIBSVM to realize the final emotion recognition. The experimental results show that the recognition accuracy of the proposed method on the MOSI and MELD datasets are 87.56 and 90.06%, respectively, which are better than other comparison methods. It has laid a certain theoretical foundation for the application of multimodal fusion in emotion recognition.

摘要

单模态特征提取过程中产生的冗余信息、噪声数据以及传统学习算法难以获得理想的识别性能。提出了一种基于深度学习的语音表情多模态融合情感识别方法。首先,针对不同的单模态设置相应的特征提取方法。其中,语音采用卷积神经网络-长短时记忆(CNN-LSTM)网络,视频中的面部表情采用Inception-Res Net-v2网络提取特征数据。然后,使用长短时记忆(LSTM)来捕捉不同模态之间以及模态内部的相关性。经过卡方检验的特征选择过程后,将单模态进行拼接以获得统一的融合特征。最后,将LSTM输出的融合数据特征作为分类器LIBSVM的输入,实现最终的情感识别。实验结果表明,该方法在MOSI和MELD数据集上的识别准确率分别为87.56%和90.06%,优于其他对比方法。为多模态融合在情感识别中的应用奠定了一定的理论基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b80/8300695/d7647490c625/fnbot-15-697634-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验