Suppr超能文献

COM:用于不完全多模态学习的对比掩蔽注意力模型。

COM: Contrastive Masked-attention model for incomplete multimodal learning.

机构信息

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, China.

出版信息

Neural Netw. 2023 May;162:443-455. doi: 10.1016/j.neunet.2023.03.003. Epub 2023 Mar 5.

Abstract

Most multimodal learning methods assume that all modalities are always available in data. However, in real-world applications, the assumption is often violated due to privacy protection, sensor failure etc. Previous works for incomplete multimodal learning often suffer from one of the following drawbacks: introducing noise, lacking flexibility to missing patterns and failing to capture interactions between modalities. To overcome these challenges, we propose a COntrastive Masked-attention model (COM). The framework performs cross-modal contrastive learning with GAN-based augmentation to reduce modality gap, and employs a masked-attention model to capture interactions between modalities. The augmentation adapts cross-modal contrastive learning to suit incomplete case by a two-player game, improving the effectiveness of multimodal representations. Interactions between modalities are modeled by stacking self-attention blocks, and attention masks limit them on the observed modalities to avoid extra noise. All kinds of modality combinations share a unified architecture, so the model is flexible to different missing patterns. Extensive experiments on six datasets demonstrate the effectiveness and robustness of the proposed method for incomplete multimodal learning.

摘要

大多数多模态学习方法都假设所有模态在数据中都是始终可用的。然而,在实际应用中,由于隐私保护、传感器故障等原因,这种假设往往会被违反。以前的不完全多模态学习工作往往存在以下缺点之一:引入噪声、对缺失模式缺乏灵活性以及无法捕捉模态之间的相互作用。为了克服这些挑战,我们提出了一种 COntrastive Masked-attention 模型(COM)。该框架通过基于 GAN 的增强进行跨模态对比学习,以减少模态差距,并采用掩蔽注意力模型来捕捉模态之间的相互作用。增强通过两名玩家的游戏来适应不完全案例的跨模态对比学习,提高了多模态表示的有效性。模态之间的相互作用通过堆叠自注意块来建模,并且注意力掩码将其限制在观察到的模态上,以避免额外的噪声。各种模态组合共享一个统一的架构,因此模型对不同的缺失模式具有灵活性。在六个数据集上的广泛实验表明,该方法对于不完全多模态学习是有效和稳健的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验