DashFusion：用于多模态情感分析的具有分层瓶颈融合的双流对齐

DashFusion: Dual-Stream Alignment With Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis.

作者信息

Wen Yuhua, Li Qifei, Zhou Yingying, Gao Yingming, Wen Zhengqi, Tao Jianhua, Li Ya

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jun 18;PP. doi: 10.1109/TNNLS.2025.3578618.

DOI:10.1109/TNNLS.2025.3578618

Abstract

Multimodal sentiment analysis (MSA) integrates various modalities, such as text, image, and audio, to provide a more comprehensive understanding of sentiment. However, effective MSA is challenged by alignment and fusion issues. Alignment requires synchronizing both temporal and semantic information across modalities, while fusion involves integrating these aligned features into a unified representation. Existing methods often address alignment or fusion in isolation, leading to limitations in performance and efficiency. To tackle these issues, we propose a novel framework called dual-stream alignment with hierarchical bottleneck fusion (DashFusion). First, the dual-stream alignment module synchronizes multimodal features through temporal and semantic alignment. Temporal alignment employs cross-modal attention (CA) to establish frame-level correspondences among multimodal sequences. Semantic alignment ensures consistency across the feature space through contrastive learning. Second, supervised contrastive learning (SCL) leverages label information to refine the modality features. Finally, hierarchical bottleneck fusion (HBF) progressively integrates multimodal information through compressed bottleneck tokens, which achieves a balance between performance and computational efficiency. We evaluate DashFusion on three datasets: CMU-MOSI, CMU-MOSEI, and CH-SIMS. Experimental results demonstrate that DashFusion achieves state-of-the-art (SOTA) performance across various metrics, and ablation studies confirm the effectiveness of our alignment and fusion techniques. The codes for our experiments are available at https://github.com/ultramarineX/DashFusion.

摘要

多模态情感分析（MSA）整合了各种模态，如文本、图像和音频，以更全面地理解情感。然而，有效的多模态情感分析面临着对齐和融合问题的挑战。对齐需要跨模态同步时间和语义信息，而融合则涉及将这些对齐的特征整合为统一表示。现有方法通常孤立地处理对齐或融合，导致性能和效率受限。为了解决这些问题，我们提出了一种名为双流对齐与分层瓶颈融合（DashFusion）的新颖框架。首先，双流对齐模块通过时间和语义对齐来同步多模态特征。时间对齐采用跨模态注意力（CA）在多模态序列之间建立帧级对应关系。语义对齐通过对比学习确保特征空间的一致性。其次，监督对比学习（SCL）利用标签信息来细化模态特征。最后，分层瓶颈融合（HBF）通过压缩瓶颈令牌逐步整合多模态信息，在性能和计算效率之间实现平衡。我们在三个数据集上评估了DashFusion：CMU-MOSI、CMU-MOSEI和CH-SIMS。实验结果表明，DashFusion在各种指标上均达到了当前最优（SOTA）性能，消融研究证实了我们的对齐和融合技术的有效性。我们实验的代码可在https://github.com/ultramarineX/DashFusion获取。

相似文献

DashFusion: Dual-Stream Alignment With Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis.DashFusion：用于多模态情感分析的具有分层瓶颈融合的双流对齐

IEEE Trans Neural Netw Learn Syst. 2025 Jun 18;PP. doi: 10.1109/TNNLS.2025.3578618.

Stakeholders' perceptions and experiences of factors influencing the commissioning, delivery, and uptake of general health checks: a qualitative evidence synthesis.利益相关者对影响一般健康检查的委托、提供和接受因素的看法与体验：一项定性证据综合分析

Cochrane Database Syst Rev. 2025 Mar 20;3(3):CD014796. doi: 10.1002/14651858.CD014796.pub2.

Aural toilet (ear cleaning) for chronic suppurative otitis media.慢性化脓性中耳炎的耳道清理（耳部清洁）

Cochrane Database Syst Rev. 2025 Jun 9;6(6):CD013057. doi: 10.1002/14651858.CD013057.pub3.

Wood Waste Valorization and Classification Approaches: A systematic review.木材废料的增值与分类方法：一项系统综述

Open Res Eur. 2025 May 6;5:5. doi: 10.12688/openreseurope.18862.1. eCollection 2025.

Joint localization and classification of breast masses on ultrasound images using an auxiliary attention-based framework.基于辅助注意力框架的超声图像中乳腺肿块的联合定位与分类。

Med Image Anal. 2023 Dec;90:102960. doi: 10.1016/j.media.2023.102960. Epub 2023 Sep 14.

Automated Multi-grade Brain Tumor Classification Using Adaptive Hierarchical Optimized Horse Herd BiLSTM Fusion Network in MRI Images.基于自适应分层优化马群双向长短期记忆融合网络的MRI图像自动多分级脑肿瘤分类

Interdiscip Sci. 2025 Jun 18. doi: 10.1007/s12539-025-00708-4.

Prognostic factors for return to work in breast cancer survivors.乳腺癌幸存者恢复工作的预后因素。

Cochrane Database Syst Rev. 2025 May 7;5(5):CD015124. doi: 10.1002/14651858.CD015124.pub2.

Long-term mechanical failure in well aligned adult spinal deformity patients.成年脊柱畸形患者脊柱排列良好时的长期机械性失效

Spine J. 2025 Feb;25(2):337-346. doi: 10.1016/j.spinee.2024.09.019. Epub 2024 Sep 26.

Predicting occupant response curves in vehicle crashes via Attention-enhanced multimodal temporal Network.通过注意力增强多模态时间网络预测车辆碰撞中的乘员响应曲线。

Accid Anal Prev. 2025 Jun 17;220:108140. doi: 10.1016/j.aap.2025.108140.

Molecular feature-based classification of retroperitoneal liposarcoma: a prospective cohort study.基于分子特征的腹膜后脂肪肉瘤分类：一项前瞻性队列研究。

Elife. 2025 May 23;14:RP100887. doi: 10.7554/eLife.100887.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DashFusion：用于多模态情感分析的具有分层瓶颈融合的双流对齐

DashFusion: Dual-Stream Alignment With Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis.

作者信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献