汉英跨文化交际中多模态情感感知与生理信号融合分析：融入自注意力增强的Transformer方法

Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement.

作者信息

Bi Xin, Zhang Tian

机构信息

School of Literature, Heilongjiang University, Harbin, Heilongjiang, China.

Department of Languages and Literary Studies, Lafayette College, Easton, PA, United States.

出版信息

PeerJ Comput Sci. 2025 May 23;11:e2890. doi: 10.7717/peerj-cs.2890. eCollection 2025.

DOI:10.7717/peerj-cs.2890

PMID:40567693

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12192752/

Abstract

With the acceleration of globalization, cross-cultural communication has become a crucial issue in various fields. Emotion, as an essential component of communication, plays a key role in improving understanding and interaction efficiency across different cultures. However, accurately recognizing emotions across cultural backgrounds remains a major challenge in affective computing, particularly due to limitations in multimodal feature fusion and temporal dependency modeling in traditional approaches. To address this, we propose the TAF-ATRM framework, which integrates Transformer and multi-head attention mechanisms for cross-cultural emotion recognition. Specifically, the framework employs bidirectional encoder representations from transformers (BERT) for semantic feature extraction from text, Mel-frequency Cepstral Coefficients (MFCC) and Residual Neural Network (ResNet) for capturing critical features from speech and facial expressions, respectively, thereby enhancing multimodal emotion recognition capability. To improve the fusion of multimodal data, the Transformer is utilized for temporal feature modeling, while multi-head attention reinforces feature representation by capturing complex inter-modal dependencies. The framework is evaluated on the MOSI and MOSEI datasets, where experimental results demonstrate that TAF-ATRM outperforms traditional methods in emotion classification accuracy and robustness, particularly in cross-cultural emotion recognition tasks. This study provides a strong technical foundation for future advancements in multimodal emotion analysis and cross-cultural affective computing.

摘要

随着全球化的加速，跨文化交流已成为各个领域的关键问题。情感作为交流的重要组成部分，在增进不同文化间的理解和互动效率方面发挥着关键作用。然而，在情感计算中，跨文化背景下准确识别情感仍然是一项重大挑战，尤其是由于传统方法在多模态特征融合和时间依赖性建模方面存在局限性。为解决这一问题，我们提出了TAF-ATRM框架，该框架集成了Transformer和多头注意力机制用于跨文化情感识别。具体而言，该框架采用来自Transformer的双向编码器表示（BERT）从文本中提取语义特征，分别使用梅尔频率倒谱系数（MFCC）和残差神经网络（ResNet）从语音和面部表情中捕捉关键特征，从而增强多模态情感识别能力。为了改进多模态数据的融合，Transformer用于时间特征建模，而多头注意力通过捕捉复杂的模态间依赖性来强化特征表示。该框架在MOSI和MOSEI数据集上进行了评估，实验结果表明TAF-ATRM在情感分类准确性和鲁棒性方面优于传统方法，尤其是在跨文化情感识别任务中。本研究为多模态情感分析和跨文化情感计算的未来发展提供了坚实的技术基础。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

汉英跨文化交际中多模态情感感知与生理信号融合分析：融入自注意力增强的Transformer方法

Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

汉英跨文化交际中多模态情感感知与生理信号融合分析：融入自注意力增强的Transformer方法

Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement.

作者信息

机构信息

出版信息

相似文献

本文引用的文献