• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于迁移学习的RAVDESS数据集多模态情感识别

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning.

作者信息

Luna-Jiménez Cristina, Griol David, Callejas Zoraida, Kleinlein Ricardo, Montero Juan M, Fernández-Martínez Fernando

机构信息

Grupo de Tecnología del Habla y Aprendizaje Automático (THAU Group), Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain.

Department of Software Engineering, CITIC-UGR, University of Granada, Periodista Daniel Saucedo Aranda S/N, 18071 Granada, Spain.

出版信息

Sensors (Basel). 2021 Nov 18;21(22):7665. doi: 10.3390/s21227665.

DOI:10.3390/s21227665
PMID:34833739
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8618559/
Abstract

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users' emotional state and their combination enables improvement of system performance.

摘要

情感识别因其可应用于多个领域,如医疗保健或道路安全系统,而吸引了研究界的关注。在本文中,我们提出了一种基于语音和面部信息的多模态情感识别系统。对于基于语音的模态,我们评估了几种迁移学习技术,更具体地说,是嵌入提取和微调。当我们对PANNs框架的CNN-14进行微调时,取得了最佳的准确率结果,这证实了训练在不是从零开始且任务相似时更加稳健。关于面部情感识别器,我们提出了一个框架,该框架由一个在显著性图和面部图像上预训练的空间变换器网络,以及一个带有注意力机制的双向长短期记忆网络组成。误差分析表明,尽管进行了域适应,但基于帧的系统在直接用于解决基于视频的任务时可能会出现一些问题,这开启了一条新的研究路线,以发现纠正这种不匹配并利用这些预训练模型的嵌入知识的新方法。最后,通过将这两种模态与后期融合策略相结合,我们在RAVDESS数据集上进行的按受试者5折交叉验证评估中,对八种情感进行分类,准确率达到了80.08%。结果表明,这些模态携带了用于检测用户情绪状态的相关信息,并且它们的组合能够提高系统性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/ada692e5ae86/sensors-21-07665-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/7356230b261b/sensors-21-07665-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/83b28b0d7a92/sensors-21-07665-g0A2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/4de17433f332/sensors-21-07665-g0A3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/6b9fa8ce672f/sensors-21-07665-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/1ef9e01961cf/sensors-21-07665-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/fa6d565a6aac/sensors-21-07665-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/cc5fe5fbf71b/sensors-21-07665-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/b2dcdda37f85/sensors-21-07665-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/f11cbac298db/sensors-21-07665-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/99d364f5d7a5/sensors-21-07665-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/ada692e5ae86/sensors-21-07665-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/7356230b261b/sensors-21-07665-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/83b28b0d7a92/sensors-21-07665-g0A2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/4de17433f332/sensors-21-07665-g0A3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/6b9fa8ce672f/sensors-21-07665-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/1ef9e01961cf/sensors-21-07665-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/fa6d565a6aac/sensors-21-07665-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/cc5fe5fbf71b/sensors-21-07665-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/b2dcdda37f85/sensors-21-07665-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/f11cbac298db/sensors-21-07665-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/99d364f5d7a5/sensors-21-07665-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da05/8618559/ada692e5ae86/sensors-21-07665-g008.jpg

相似文献

1
Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning.基于迁移学习的RAVDESS数据集多模态情感识别
Sensors (Basel). 2021 Nov 18;21(22):7665. doi: 10.3390/s21227665.
2
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
3
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
4
EEG-based emotion charting for Parkinson's disease patients using Convolutional Recurrent Neural Networks and cross dataset learning.基于 EEG 的帕金森病患者情绪图表分析,使用卷积循环神经网络和跨数据集学习。
Comput Biol Med. 2022 May;144:105327. doi: 10.1016/j.compbiomed.2022.105327. Epub 2022 Mar 11.
5
Speech Emotion Recognition Using Attention Model.基于注意力模型的语音情感识别
Int J Environ Res Public Health. 2023 Mar 14;20(6):5140. doi: 10.3390/ijerph20065140.
6
Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition.基于深度度量学习的话语级特征聚合在语音情感识别中的研究
Sensors (Basel). 2021 Jun 20;21(12):4233. doi: 10.3390/s21124233.
7
Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network.使用混合卷积神经网络检测 RAVDESS 音频的语音情感。
J Healthc Eng. 2022 Feb 27;2022:8472947. doi: 10.1155/2022/8472947. eCollection 2022.
8
Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier.基于多层感知器分类器的语音情感识别的人机交互。
J Healthc Eng. 2022 Mar 28;2022:6005446. doi: 10.1155/2022/6005446. eCollection 2022.
9
Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning.基于深度学习的语音情感识别的双向特征提取。
Sensors (Basel). 2022 Mar 19;22(6):2378. doi: 10.3390/s22062378.
10
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.

引用本文的文献

1
Multimodal Sensing-Enabled Large Language Models for Automated Emotional Regulation: A Review of Current Technologies, Opportunities, and Challenges.用于自动情绪调节的多模态传感大语言模型:当前技术、机遇与挑战综述
Sensors (Basel). 2025 Aug 1;25(15):4763. doi: 10.3390/s25154763.
2
Stereo-Electroencephalography-Guided Network Neuromodulation for Psychiatric Disorders: The Neurophysiology Monitoring Unit.立体定向脑电图引导的网络神经调节治疗精神障碍:神经生理监测单元。
Oper Neurosurg (Hagerstown). 2024 Sep 1;27(3):329-336. doi: 10.1227/ons.0000000000001122. Epub 2024 Apr 9.
3
Transfer learning-based English translation text classification in a multimedia network environment.

本文引用的文献

1
Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network.深情感知:基于注意力卷积网络的表情识别
Sensors (Basel). 2021 Apr 27;21(9):3046. doi: 10.3390/s21093046.
2
Enhancing Mouth-Based Emotion Recognition Using Transfer Learning.基于迁移学习的口腔表情识别增强。
Sensors (Basel). 2020 Sep 13;20(18):5222. doi: 10.3390/s20185222.
3
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
多媒体网络环境下基于迁移学习的英语翻译文本分类
PeerJ Comput Sci. 2024 Jan 31;10:e1842. doi: 10.7717/peerj-cs.1842. eCollection 2024.
4
Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features.基于提取的面部和语音特征的注意力融合的多模态情感检测。
Sensors (Basel). 2023 Jun 9;23(12):5475. doi: 10.3390/s23125475.
5
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients.物联网支持的 WBAN 和机器学习在患者语音情感识别中的应用。
Sensors (Basel). 2023 Mar 8;23(6):2948. doi: 10.3390/s23062948.
6
Facial Expression Recognition Robust to Occlusion and to Intra-Similarity Problem Using Relevant Subsampling.基于相关子采样的遮挡和类内相似性问题的鲁棒面部表情识别。
Sensors (Basel). 2023 Feb 27;23(5):2619. doi: 10.3390/s23052619.
7
Audio-Visual Stress Classification Using Cascaded RNN-LSTM Networks.使用级联循环神经网络-长短期记忆网络的视听压力分类
Bioengineering (Basel). 2022 Sep 27;9(10):510. doi: 10.3390/bioengineering9100510.
8
Emotional Speech Recognition Method Based on Word Transcription.基于单词转录的情感语音识别方法。
Sensors (Basel). 2022 Mar 2;22(5):1937. doi: 10.3390/s22051937.
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
4
Contextual encoder-decoder network for visual saliency prediction.上下文编解码网络的视觉显著性预测。
Neural Netw. 2020 Sep;129:261-270. doi: 10.1016/j.neunet.2020.05.004. Epub 2020 May 8.
5
An Emotion Recognition-Awareness Vulnerability Hypothesis for Depression in Adolescence: A Systematic Review.青少年抑郁的情绪识别-意识脆弱性假说:系统评价。
Clin Child Fam Psychol Rev. 2020 Mar;23(1):27-53. doi: 10.1007/s10567-019-00302-3.
6
Multimodal Machine Learning: A Survey and Taxonomy.多模态机器学习:一项综述与分类法
IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):423-443. doi: 10.1109/TPAMI.2018.2798607. Epub 2018 Jan 25.
7
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English.瑞尔森情感语音和歌曲音频视频数据库(RAVDESS):一组具有北美英语特色的动态、多模态面部和声音表情数据集。
PLoS One. 2018 May 16;13(5):e0196391. doi: 10.1371/journal.pone.0196391. eCollection 2018.
8
From 'automation' to 'autonomy': the importance of trust repair in human-machine interaction.从“自动化”到“自主性”:人机交互中信任修复的重要性。
Ergonomics. 2018 Oct;61(10):1409-1427. doi: 10.1080/00140139.2018.1457725. Epub 2018 Apr 9.
9
Facial emotion recognition in Parkinson's disease: A review and new hypotheses.帕金森病的面部情绪识别:综述与新假说。
Mov Disord. 2018 Apr;33(4):554-567. doi: 10.1002/mds.27305. Epub 2018 Feb 23.
10
Challenges in representation learning: a report on three machine learning contests.表示学习中的挑战:三个机器学习竞赛的报告。
Neural Netw. 2015 Apr;64:59-63. doi: 10.1016/j.neunet.2014.09.005. Epub 2014 Dec 29.