Li Ping, Li Ao, Li Xinhui, Lv Zhao
School of Computer Science and Technology, Anhui University, Hefei 230601, China.
The Key Laboratory of Flight Techniques and Flight Safety, Civil Aviation Flight University of China, Deyang 618307, China.
Bioengineering (Basel). 2025 May 15;12(5):528. doi: 10.3390/bioengineering12050528.
Multimodal physiological emotion recognition is challenged by modality heterogeneity and inter-subject variability, which hinder model generalization and robustness. To address these issues, this paper proposes a new framework, Cross-modal Transformer with Enhanced Learning-Classifying Adversarial Network (CT-ELCAN). The core idea of CT-ELCAN is to shift the focus from conventional signal fusion to the alignment of modality- and subject-invariant emotional representations. By combining a cross-modal Transformer with ELCAN, a feature alignment module using adversarial training, CT-ELCAN learns modality- and subject-invariant emotional representations. Experimental results on the public datasets DEAP and WESAD demonstrate that CT-ELCAN achieves accuracy improvements of approximately 7% and 5%, respectively, compared to state-of-the-art models, while also exhibiting enhanced robustness.
多模态生理情感识别面临模态异质性和个体间变异性的挑战,这阻碍了模型的泛化能力和鲁棒性。为了解决这些问题,本文提出了一种新的框架,即带有增强学习分类对抗网络的跨模态Transformer(CT-ELCAN)。CT-ELCAN的核心思想是将重点从传统的信号融合转移到模态和个体不变情感表征的对齐上。通过将跨模态Transformer与使用对抗训练的ELCAN相结合,即一个特征对齐模块,CT-ELCAN学习模态和个体不变的情感表征。在公共数据集DEAP和WESAD上的实验结果表明,与现有最先进的模型相比,CT-ELCAN的准确率分别提高了约7%和5%,同时还表现出更强的鲁棒性。