Suppr超能文献

基于迁移学习的RAVDESS数据集多模态情感识别

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning.

作者信息

Luna-Jiménez Cristina, Griol David, Callejas Zoraida, Kleinlein Ricardo, Montero Juan M, Fernández-Martínez Fernando

机构信息

Grupo de Tecnología del Habla y Aprendizaje Automático (THAU Group), Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain.

Department of Software Engineering, CITIC-UGR, University of Granada, Periodista Daniel Saucedo Aranda S/N, 18071 Granada, Spain.

出版信息

Sensors (Basel). 2021 Nov 18;21(22):7665. doi: 10.3390/s21227665.

Abstract

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users' emotional state and their combination enables improvement of system performance.

摘要

情感识别因其可应用于多个领域,如医疗保健或道路安全系统,而吸引了研究界的关注。在本文中,我们提出了一种基于语音和面部信息的多模态情感识别系统。对于基于语音的模态,我们评估了几种迁移学习技术,更具体地说,是嵌入提取和微调。当我们对PANNs框架的CNN-14进行微调时,取得了最佳的准确率结果,这证实了训练在不是从零开始且任务相似时更加稳健。关于面部情感识别器,我们提出了一个框架,该框架由一个在显著性图和面部图像上预训练的空间变换器网络,以及一个带有注意力机制的双向长短期记忆网络组成。误差分析表明,尽管进行了域适应,但基于帧的系统在直接用于解决基于视频的任务时可能会出现一些问题,这开启了一条新的研究路线,以发现纠正这种不匹配并利用这些预训练模型的嵌入知识的新方法。最后,通过将这两种模态与后期融合策略相结合,我们在RAVDESS数据集上进行的按受试者5折交叉验证评估中,对八种情感进行分类,准确率达到了80.08%。结果表明,这些模态携带了用于检测用户情绪状态的相关信息,并且它们的组合能够提高系统性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/7356230b261b/sensors-21-07665-g0A1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验