Bi Xin, Zhang Tian
School of Literature, Heilongjiang University, Harbin, Heilongjiang, China.
Department of Languages and Literary Studies, Lafayette College, Easton, PA, United States.
PeerJ Comput Sci. 2025 May 23;11:e2890. doi: 10.7717/peerj-cs.2890. eCollection 2025.
With the acceleration of globalization, cross-cultural communication has become a crucial issue in various fields. Emotion, as an essential component of communication, plays a key role in improving understanding and interaction efficiency across different cultures. However, accurately recognizing emotions across cultural backgrounds remains a major challenge in affective computing, particularly due to limitations in multimodal feature fusion and temporal dependency modeling in traditional approaches. To address this, we propose the TAF-ATRM framework, which integrates Transformer and multi-head attention mechanisms for cross-cultural emotion recognition. Specifically, the framework employs bidirectional encoder representations from transformers (BERT) for semantic feature extraction from text, Mel-frequency Cepstral Coefficients (MFCC) and Residual Neural Network (ResNet) for capturing critical features from speech and facial expressions, respectively, thereby enhancing multimodal emotion recognition capability. To improve the fusion of multimodal data, the Transformer is utilized for temporal feature modeling, while multi-head attention reinforces feature representation by capturing complex inter-modal dependencies. The framework is evaluated on the MOSI and MOSEI datasets, where experimental results demonstrate that TAF-ATRM outperforms traditional methods in emotion classification accuracy and robustness, particularly in cross-cultural emotion recognition tasks. This study provides a strong technical foundation for future advancements in multimodal emotion analysis and cross-cultural affective computing.
随着全球化的加速,跨文化交流已成为各个领域的关键问题。情感作为交流的重要组成部分,在增进不同文化间的理解和互动效率方面发挥着关键作用。然而,在情感计算中,跨文化背景下准确识别情感仍然是一项重大挑战,尤其是由于传统方法在多模态特征融合和时间依赖性建模方面存在局限性。为解决这一问题,我们提出了TAF-ATRM框架,该框架集成了Transformer和多头注意力机制用于跨文化情感识别。具体而言,该框架采用来自Transformer的双向编码器表示(BERT)从文本中提取语义特征,分别使用梅尔频率倒谱系数(MFCC)和残差神经网络(ResNet)从语音和面部表情中捕捉关键特征,从而增强多模态情感识别能力。为了改进多模态数据的融合,Transformer用于时间特征建模,而多头注意力通过捕捉复杂的模态间依赖性来强化特征表示。该框架在MOSI和MOSEI数据集上进行了评估,实验结果表明TAF-ATRM在情感分类准确性和鲁棒性方面优于传统方法,尤其是在跨文化情感识别任务中。本研究为多模态情感分析和跨文化情感计算的未来发展提供了坚实的技术基础。