前置门控与上下文注意门控：一种用于多模态数据任务的新融合方法。

Pre-gating and contextual attention gate - A new fusion method for multi-modal data tasks.

机构信息

Centre for Data Science, School of Computer Science, Queensland University of Technology, 4000, Brisbane, Australia.

出版信息

Neural Netw. 2024 Nov;179:106553. doi: 10.1016/j.neunet.2024.106553. Epub 2024 Jul 17.

DOI:10.1016/j.neunet.2024.106553

Abstract

Multi-modal representation learning has received significant attention across diverse research domains due to its ability to model a scenario comprehensively. Learning the cross-modal interactions is essential to combining multi-modal data into a joint representation. However, conventional cross-attention mechanisms can produce noisy and non-meaningful values in the absence of useful cross-modal interactions among input features, thereby introducing uncertainty into the feature representation. These factors have the potential to degrade the performance of downstream tasks. This paper introduces a novel Pre-gating and Contextual Attention Gate (PCAG) module for multi-modal learning comprising two gating mechanisms that operate at distinct information processing levels within the deep learning model. The first gate filters out interactions that lack informativeness for the downstream task, while the second gate reduces the uncertainty introduced by the cross-attention module. Experimental results on eight multi-modal classification tasks spanning various domains show that the multi-modal fusion model with PCAG outperforms state-of-the-art multi-modal fusion models. Additionally, we elucidate how PCAG effectively processes cross-modality interactions.

摘要

多模态表示学习因其能够全面建模场景而在多个研究领域受到广泛关注。学习跨模态交互对于将多模态数据组合成联合表示至关重要。然而，在输入特征之间缺乏有用的跨模态交互的情况下，传统的交叉注意机制可能会产生嘈杂和无意义的值，从而给特征表示带来不确定性。这些因素有可能降低下游任务的性能。本文提出了一种新的用于多模态学习的预门控和上下文注意门 (PCAG) 模块，该模块由两个门控机制组成，它们在深度学习模型内的不同信息处理级别上运行。第一个门控机制过滤掉对下游任务缺乏信息量的交互，而第二个门控机制则减少了交叉注意模块引入的不确定性。在涵盖各种领域的八个多模态分类任务上的实验结果表明，具有 PCAG 的多模态融合模型优于最先进的多模态融合模型。此外，我们阐明了 PCAG 如何有效地处理跨模态交互。

相似文献

Pre-gating and contextual attention gate - A new fusion method for multi-modal data tasks.前置门控与上下文注意门控：一种用于多模态数据任务的新融合方法。

Neural Netw. 2024 Nov;179:106553. doi: 10.1016/j.neunet.2024.106553. Epub 2024 Jul 17.

SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross：用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。

Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.

A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。

Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.

HAMMF: Hierarchical attention-based multi-task and multi-modal fusion model for computer-aided diagnosis of Alzheimer's disease.HAMMF：用于阿尔茨海默病计算机辅助诊断的基于层次注意力的多任务多模态融合模型。

Comput Biol Med. 2024 Jun;176:108564. doi: 10.1016/j.compbiomed.2024.108564. Epub 2024 May 8.

MBFusion: Multi-modal balanced fusion and multi-task learning for cancer diagnosis and prognosis.MBFusion：用于癌症诊断和预后的多模态平衡融合和多任务学习。

Comput Biol Med. 2024 Oct;181:109042. doi: 10.1016/j.compbiomed.2024.109042. Epub 2024 Aug 24.

Attention-based convolutional neural network with multi-modal temporal information fusion for motor imagery EEG decoding.基于注意力的卷积神经网络与多模态时间信息融合在运动想象 EEG 解码中的应用。

Comput Biol Med. 2024 Jun;175:108504. doi: 10.1016/j.compbiomed.2024.108504. Epub 2024 Apr 24.

AutoAMS: Automated attention-based multi-modal graph learning architecture search.AutoAMS：基于自动化注意力的多模态图学习架构搜索。

Neural Netw. 2024 Nov;179:106427. doi: 10.1016/j.neunet.2024.106427. Epub 2024 Jun 22.

Development and validation of a multi-modality fusion deep learning model for differentiating glioblastoma from solitary brain metastases.开发和验证一种多模态融合深度学习模型，用于区分胶质母细胞瘤和单发脑转移瘤。

Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2024 Jan 28;49(1):58-67. doi: 10.11817/j.issn.1672-7347.2024.230248.

Self-Supervised Multi-Modal Hybrid Fusion Network for Brain Tumor Segmentation.基于自监督多模态混合融合网络的脑肿瘤分割。

IEEE J Biomed Health Inform. 2022 Nov;26(11):5310-5320. doi: 10.1109/JBHI.2021.3109301. Epub 2022 Nov 10.

Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性

Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

前置门控与上下文注意门控：一种用于多模态数据任务的新融合方法。

Pre-gating and contextual attention gate - A new fusion method for multi-modal data tasks.

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献