Zhang Zhenxuan, Lu Guanyu
State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, China.
School of Cyberspace Security, Beijing Jiaotong University, Beijing 100044, China.
Brain Sci. 2025 Jun 30;15(7):707. doi: 10.3390/brainsci15070707.
Multimodal emotion recognition has emerged as a prominent field in affective computing, offering superior performance compared to single-modality methods. Among various physiological signals, EEG signals and EOG data are highly valued for their complementary strengths in emotion recognition. However, the practical application of EEG-based approaches is often hindered by high costs and operational complexity, making EOG a more feasible alternative in real-world scenarios. To address this limitation, this study introduces a novel framework for multimodal knowledge distillation, designed to improve the practicality of emotion decoding while maintaining high accuracy, with the framework including a multimodal fusion module to extract and integrate interactive and heterogeneous features, and a unimodal student model structurally aligned with the multimodal teacher model for better knowledge alignment. The framework combines EEG and EOG signals into a unified model and distills the fused multimodal features into a simplified EOG-only model. To facilitate efficient knowledge transfer, the approach incorporates a dynamic feedback mechanism that adjusts the guidance provided by the multimodal model to the unimodal model during the distillation process based on performance metrics. The proposed method was comprehensively evaluated on two datasets based on EEG and EOG signals. The accuracy of the valence and arousal of the proposed model in the DEAP dataset are 70.38% and 60.41%, respectively. The accuracy of valence and arousal in the BJTU-Emotion dataset are 61.31% and 60.31%, respectively. The proposed method achieves state-of-the-art classification performance compared to the baseline method, with statistically significant improvements confirmed by paired -tests ( < 0.05), and the framework effectively transfers knowledge from multimodal models to unimodal EOG models, enhancing the practicality of emotion recognition while maintaining high accuracy, thus expanding the applicability of emotion recognition in real-world scenarios.
多模态情感识别已成为情感计算领域的一个重要方向,与单模态方法相比,具有更优的性能。在各种生理信号中,脑电图(EEG)信号和眼电图(EOG)数据因其在情感识别中的互补优势而备受重视。然而,基于EEG的方法在实际应用中常常受到高成本和操作复杂性的阻碍,这使得EOG在实际场景中成为更可行的选择。为了解决这一局限性,本研究引入了一种新颖的多模态知识蒸馏框架,旨在在保持高精度的同时提高情感解码的实用性。该框架包括一个多模态融合模块,用于提取和整合交互性和异构性特征,以及一个与多模态教师模型结构对齐的单模态学生模型,以实现更好的知识对齐。该框架将EEG和EOG信号整合到一个统一的模型中,并将融合后的多模态特征提炼到一个简化的仅基于EOG的模型中。为了促进高效的知识转移,该方法引入了一种动态反馈机制,在蒸馏过程中根据性能指标调整多模态模型对单模态模型的指导。所提出的方法在两个基于EEG和EOG信号的数据集上进行了全面评估。在DEAP数据集中,所提出模型的效价和唤醒度准确率分别为70.38%和60.41%。在BJTU-Emotion数据集中,效价和唤醒度准确率分别为61.31%和60.31%。与基线方法相比,所提出的方法实现了最优的分类性能,配对检验证实了具有统计学意义的显著改进(<0.05),并且该框架有效地将知识从多模态模型转移到单模态EOG模型,在保持高精度的同时增强了情感识别的实用性,从而扩大了情感识别在实际场景中的适用性。