Qin Zhenkai, Luo Qining, Zang Zhidong, Fu Hongpeng
College of Information Technology, Guangxi Police College, Juntang Street, Nanning, Guangxi, China.
School of Social Development, Yangzhou University, Yangzhou, 225009, China.
Sci Rep. 2025 Mar 24;15(1):10112. doi: 10.1038/s41598-025-93023-3.
Multimodal sentiment analysis combines text, audio, and visual signals to understand human emotions. However, current methods often face challenges in handling asynchronous signals and capturing long-term dependencies between different modalities. Early techniques that merge multiple modalities often introduce unnecessary complexity, while newer methods that treat each modality separately may miss important relationships between the signals. Transformer-based models are effective, but they are typically too resource-heavy for practical use. To overcome these issues, we introduce the multimodal GRU model (MulG), which uses a cross-modal attention mechanism to better synchronize the different signals and capture their dependencies. MulG also employs GRU layers, which are efficient in handling sequential data, making it both accurate and computationally efficient. Extensive experiments on datasets such as CMU-MOSI, CMU-MOSEI, and IEMOCAP demonstrate that MulG outperforms existing methods in accuracy, F1 score, and correlation. Specifically, MulG achieves 82.2% accuracy on CMU-MOSI's 7-class task, 82.1% on CMU-MOSEI, and 90.6% on IEMOCAP's emotion classification. Further ablation studies show that each component of the model contributes significantly to its overall performance. By addressing the limitations of previous approaches, MulG offers a practical and scalable solution for applications like analyzing user-generated content and improving human-computer interaction.
多模态情感分析结合文本、音频和视觉信号来理解人类情感。然而,当前方法在处理异步信号以及捕捉不同模态之间的长期依赖关系时常常面临挑战。早期合并多种模态的技术往往会引入不必要的复杂性,而将每种模态单独处理的新方法可能会忽略信号之间的重要关系。基于Transformer的模型很有效,但在实际应用中通常资源消耗过大。为了克服这些问题,我们引入了多模态门控循环单元模型(MulG),它使用跨模态注意力机制来更好地同步不同信号并捕捉它们的依赖关系。MulG还采用了门控循环单元层,该层在处理序列数据方面效率很高,使其既准确又计算高效。在CMU-MOSI、CMU-MOSEI和IEMOCAP等数据集上进行的大量实验表明,MulG在准确率、F1分数和相关性方面优于现有方法。具体而言,MulG在CMU-MOSI的7类任务上达到了82.2%的准确率,在CMU-MOSEI上为82.1%,在IEMOCAP的情感分类上为90.6%。进一步的消融研究表明,模型的每个组件对其整体性能都有显著贡献。通过解决先前方法的局限性,MulG为分析用户生成内容和改善人机交互等应用提供了一种实用且可扩展的解决方案。