School of Cyber Security, Changchun University, Changchun 130022, China.
School of Computer Science and Technology, Changchun University, Changchun 130022, China.
Sensors (Basel). 2022 Jul 25;22(15):5528. doi: 10.3390/s22155528.
With the wide application of social media, public opinion analysis in social networks has been unable to be met through text alone because the existing public opinion information includes data information of various modalities, such as voice, text, and facial expressions. Therefore multi-modal emotion analysis is the current focus of public opinion analysis. In addition, multi-modal emotion recognition of speech is an important factor restricting the multi-modal emotion analysis. In this paper, the emotion feature retrieval method for speech is firstly explored and the processing method of sample disequilibrium data is then analyzed. By comparing and studying the different feature fusion methods of text and speech, respectively, the multi-modal feature fusion method for sample disequilibrium data is proposed to realize multi-modal emotion recognition. Experiments are performed using two publicly available datasets (IEMOCAP and MELD), which shows that processing multi-modality data through this method can obtain good fine-grained emotion recognition results, laying a foundation for subsequent social public opinion analysis.
随着社交媒体的广泛应用,仅通过文本进行网络社交舆情分析已经无法满足需求,因为现有的舆情信息包含了语音、文本和表情等多种模态的数据信息。因此,多模态情感分析是当前舆情分析的重点。此外,语音的多模态情感识别是制约多模态情感分析的一个重要因素。本文首先探索了语音的情感特征检索方法,然后分析了样本不平衡数据的处理方法。通过分别比较和研究文本和语音的不同特征融合方法,提出了一种用于样本不平衡数据的多模态特征融合方法,以实现多模态情感识别。实验分别使用了两个公开可用的数据集(IEMOCAP 和 MELD),结果表明,通过这种方法处理多模态数据可以获得良好的细粒度情感识别结果,为后续的社会舆情分析奠定了基础。