Suppr超能文献

将多标签对比学习与双对抗图神经网络相结合用于跨模态检索

Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval.

作者信息

Qian Shengsheng, Xue Dizhan, Fang Quan, Xu Changsheng

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4794-4811. doi: 10.1109/TPAMI.2022.3188547. Epub 2023 Mar 7.

Abstract

With the growing amount of multimodal data, cross-modal retrieval has attracted more and more attention and become a hot research topic. To date, most of the existing techniques mainly convert multimodal data into a common representation space where similarities in semantics between samples can be easily measured across multiple modalities. However, these approaches may suffer from the following limitations: 1) They overcome the modality gap by introducing loss in the common representation space, which may not be sufficient to eliminate the heterogeneity of various modalities; 2) They treat labels as independent entities and ignore label relationships, which is not conducive to establishing semantic connections across multimodal data; 3) They ignore the non-binary values of label similarity in multi-label scenarios, which may lead to inefficient alignment of representation similarity with label similarity. To tackle these problems, in this article, we propose two models to learn discriminative and modality-invariant representations for cross-modal retrieval. First, the dual generative adversarial networks are built to project multimodal data into a common representation space. Second, to model label relation dependencies and develop inter-dependent classifiers, we employ multi-hop graph neural networks (consisting of Probabilistic GNN and Iterative GNN), where the layer aggregation mechanism is suggested for using propagation information of various hops. Third, we propose a novel soft multi-label contrastive loss for cross-modal retrieval, with the soft positive sampling probability, which can align the representation similarity and the label similarity. Additionally, to adapt to incomplete-modal learning, which can have wider applications, we propose a modal reconstruction mechanism to generate missing features. Extensive experiments on three widely used benchmark datasets, i.e., NUS-WIDE, MIRFlickr, and MS-COCO, show the superiority of our proposed method.

摘要

随着多模态数据量的不断增长,跨模态检索越来越受到关注,并成为一个热门的研究课题。迄今为止,大多数现有技术主要是将多模态数据转换到一个公共表示空间,在这个空间中,可以很容易地跨多种模态测量样本之间的语义相似性。然而,这些方法可能存在以下局限性:1)它们通过在公共表示空间中引入损失来克服模态差距,但这可能不足以消除各种模态的异质性;2)它们将标签视为独立实体,忽略了标签关系,这不利于跨多模态数据建立语义连接;3)它们忽略了多标签场景中标签相似性的非二进制值,这可能导致表示相似性与标签相似性的低效对齐。为了解决这些问题,在本文中,我们提出了两种模型,用于学习用于跨模态检索的判别性和模态不变表示。首先,构建对偶生成对抗网络,将多模态数据投影到公共表示空间。其次,为了对标签关系依赖性进行建模并开发相互依赖的分类器,我们采用多跳图神经网络(由概率图神经网络和迭代图神经网络组成),其中建议使用层聚合机制来利用各跳的传播信息。第三,我们提出了一种用于跨模态检索的新型软多标签对比损失,具有软正采样概率,它可以对齐表示相似性和标签相似性。此外,为了适应具有更广泛应用的不完全模态学习,我们提出了一种模态重建机制来生成缺失特征。在三个广泛使用的基准数据集,即NUS-WIDE、MIRFlickr和MS-COCO上进行的大量实验表明了我们提出的方法的优越性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验