Suppr超能文献

前置门控与上下文注意门控:一种用于多模态数据任务的新融合方法。

Pre-gating and contextual attention gate - A new fusion method for multi-modal data tasks.

机构信息

Centre for Data Science, School of Computer Science, Queensland University of Technology, 4000, Brisbane, Australia.

出版信息

Neural Netw. 2024 Nov;179:106553. doi: 10.1016/j.neunet.2024.106553. Epub 2024 Jul 17.

Abstract

Multi-modal representation learning has received significant attention across diverse research domains due to its ability to model a scenario comprehensively. Learning the cross-modal interactions is essential to combining multi-modal data into a joint representation. However, conventional cross-attention mechanisms can produce noisy and non-meaningful values in the absence of useful cross-modal interactions among input features, thereby introducing uncertainty into the feature representation. These factors have the potential to degrade the performance of downstream tasks. This paper introduces a novel Pre-gating and Contextual Attention Gate (PCAG) module for multi-modal learning comprising two gating mechanisms that operate at distinct information processing levels within the deep learning model. The first gate filters out interactions that lack informativeness for the downstream task, while the second gate reduces the uncertainty introduced by the cross-attention module. Experimental results on eight multi-modal classification tasks spanning various domains show that the multi-modal fusion model with PCAG outperforms state-of-the-art multi-modal fusion models. Additionally, we elucidate how PCAG effectively processes cross-modality interactions.

摘要

多模态表示学习因其能够全面建模场景而在多个研究领域受到广泛关注。学习跨模态交互对于将多模态数据组合成联合表示至关重要。然而,在输入特征之间缺乏有用的跨模态交互的情况下,传统的交叉注意机制可能会产生嘈杂和无意义的值,从而给特征表示带来不确定性。这些因素有可能降低下游任务的性能。本文提出了一种新的用于多模态学习的预门控和上下文注意门 (PCAG) 模块,该模块由两个门控机制组成,它们在深度学习模型内的不同信息处理级别上运行。第一个门控机制过滤掉对下游任务缺乏信息量的交互,而第二个门控机制则减少了交叉注意模块引入的不确定性。在涵盖各种领域的八个多模态分类任务上的实验结果表明,具有 PCAG 的多模态融合模型优于最先进的多模态融合模型。此外,我们阐明了 PCAG 如何有效地处理跨模态交互。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验