Suppr超能文献

基于对比学习的脑电信号与视听特征情感识别

Emotion Recognition Using EEG Signals and Audiovisual Features with Contrastive Learning.

作者信息

Lee Ju-Hwan, Kim Jin-Young, Kim Hyoung-Gook

机构信息

Department of Intelligent Electronics and Computer Engineering, Chonnam National University, 77 Yongbong-ro, Buk-gu, Gwangju 61186, Republic of Korea.

Department of Electronic Convergence Engineering, Kwangwoon University, 20 Gwangun-ro, Nowon-gu, Seoul 01897, Republic of Korea.

出版信息

Bioengineering (Basel). 2024 Oct 3;11(10):997. doi: 10.3390/bioengineering11100997.

Abstract

Multimodal emotion recognition has emerged as a promising approach to capture the complex nature of human emotions by integrating information from various sources such as physiological signals, visual behavioral cues, and audio-visual content. However, current methods often struggle with effectively processing redundant or conflicting information across modalities and may overlook implicit inter-modal correlations. To address these challenges, this paper presents a novel multimodal emotion recognition framework which integrates audio-visual features with viewers' EEG data to enhance emotion classification accuracy. The proposed approach employs modality-specific encoders to extract spatiotemporal features, which are then aligned through contrastive learning to capture inter-modal relationships. Additionally, cross-modal attention mechanisms are incorporated for effective feature fusion across modalities. The framework, comprising pre-training, fine-tuning, and testing phases, is evaluated on multiple datasets of emotional responses. The experimental results demonstrate that the proposed multimodal approach, which combines audio-visual features with EEG data, is highly effective in recognizing emotions, highlighting its potential for advancing emotion recognition systems.

摘要

多模态情感识别已成为一种很有前景的方法,通过整合来自各种来源的信息(如生理信号、视觉行为线索和视听内容)来捕捉人类情感的复杂本质。然而,当前的方法在有效处理跨模态的冗余或冲突信息方面往往存在困难,并且可能会忽略隐含的模态间相关性。为了应对这些挑战,本文提出了一种新颖的多模态情感识别框架,该框架将视听特征与观众的脑电图(EEG)数据相结合,以提高情感分类的准确性。所提出的方法采用特定模态的编码器来提取时空特征,然后通过对比学习进行对齐,以捕捉模态间的关系。此外,还引入了跨模态注意力机制,以实现跨模态的有效特征融合。该框架包括预训练、微调及测试阶段,并在多个情感反应数据集上进行了评估。实验结果表明,所提出的将视听特征与EEG数据相结合的多模态方法在情感识别方面非常有效,凸显了其在推进情感识别系统方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ced/11504283/5ee692447f88/bioengineering-11-00997-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验