将多标签对比学习与双对抗图神经网络相结合用于跨模态检索

Qian Shengsheng, Xue Dizhan, Fang Quan, Xu Changsheng

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4794-4811. doi: 10.1109/TPAMI.2022.3188547. Epub 2023 Mar 7.

With the growing amount of multimodal data, cross-modal retrieval has attracted more and more attention and become a hot research topic. To date, most of the existing techniques mainly convert multimodal data into a common representation space where similarities in semantics between samples can be easily measured across multiple modalities. However, these approaches may suffer from the following limitations: 1) They overcome the modality gap by introducing loss in the common representation space, which may not be sufficient to eliminate the heterogeneity of various modalities; 2) They treat labels as independent entities and ignore label relationships, which is not conducive to establishing semantic connections across multimodal data; 3) They ignore the non-binary values of label similarity in multi-label scenarios, which may lead to inefficient alignment of representation similarity with label similarity. To tackle these problems, in this article, we propose two models to learn discriminative and modality-invariant representations for cross-modal retrieval. First, the dual generative adversarial networks are built to project multimodal data into a common representation space. Second, to model label relation dependencies and develop inter-dependent classifiers, we employ multi-hop graph neural networks (consisting of Probabilistic GNN and Iterative GNN), where the layer aggregation mechanism is suggested for using propagation information of various hops. Third, we propose a novel soft multi-label contrastive loss for cross-modal retrieval, with the soft positive sampling probability, which can align the representation similarity and the label similarity. Additionally, to adapt to incomplete-modal learning, which can have wider applications, we propose a modal reconstruction mechanism to generate missing features. Extensive experiments on three widely used benchmark datasets, i.e., NUS-WIDE, MIRFlickr, and MS-COCO, show the superiority of our proposed method.

随着多模态数据量的不断增长，跨模态检索越来越受到关注，并成为一个热门的研究课题。迄今为止，大多数现有技术主要是将多模态数据转换到一个公共表示空间，在这个空间中，可以很容易地跨多种模态测量样本之间的语义相似性。然而，这些方法可能存在以下局限性：1）它们通过在公共表示空间中引入损失来克服模态差距，但这可能不足以消除各种模态的异质性；2）它们将标签视为独立实体，忽略了标签关系，这不利于跨多模态数据建立语义连接；3）它们忽略了多标签场景中标签相似性的非二进制值，这可能导致表示相似性与标签相似性的低效对齐。为了解决这些问题，在本文中，我们提出了两种模型，用于学习用于跨模态检索的判别性和模态不变表示。首先，构建对偶生成对抗网络，将多模态数据投影到公共表示空间。其次，为了对标签关系依赖性进行建模并开发相互依赖的分类器，我们采用多跳图神经网络（由概率图神经网络和迭代图神经网络组成），其中建议使用层聚合机制来利用各跳的传播信息。第三，我们提出了一种用于跨模态检索的新型软多标签对比损失，具有软正采样概率，它可以对齐表示相似性和标签相似性。此外，为了适应具有更广泛应用的不完全模态学习，我们提出了一种模态重建机制来生成缺失特征。在三个广泛使用的基准数据集，即NUS-WIDE、MIRFlickr和MS-COCO上进行的大量实验表明了我们提出的方法的优越性。

相似文献

Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval.

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4794-4811. doi: 10.1109/TPAMI.2022.3188547. Epub 2023 Mar 7.

Structure-aware contrastive hashing for unsupervised cross-modal retrieval.

Neural Netw. 2024 Jun;174:106211. doi: 10.1016/j.neunet.2024.106211. Epub 2024 Feb 27.

Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval.

Neural Netw. 2021 Feb;134:143-162. doi: 10.1016/j.neunet.2020.11.011. Epub 2020 Nov 28.

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval.

Sensors (Basel). 2023 Mar 24;23(7):3439. doi: 10.3390/s23073439.

Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval.

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):7997-8009. doi: 10.1109/TNNLS.2024.3421583. Epub 2025 May 2.

Joint Specifics and Consistency Hash Learning for Large-Scale Cross-Modal Retrieval.

IEEE Trans Image Process. 2022;31:5343-5358. doi: 10.1109/TIP.2022.3195059. Epub 2022 Aug 16.

Contrastive learning of graphs under label noise.

Neural Netw. 2024 Apr;172:106113. doi: 10.1016/j.neunet.2024.106113. Epub 2024 Jan 6.

Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.

Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.

Joint Feature Synthesis and Embedding: Adversarial Cross-Modal Retrieval Revisited.

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):3030-3047. doi: 10.1109/TPAMI.2020.3045530. Epub 2022 May 5.

MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval.

IEEE Trans Cybern. 2020 Mar;50(3):1047-1059. doi: 10.1109/TCYB.2018.2879846. Epub 2018 Dec 5.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval.

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4794-4811. doi: 10.1109/TPAMI.2022.3188547. Epub 2023 Mar 7.

Structure-aware contrastive hashing for unsupervised cross-modal retrieval.

Neural Netw. 2024 Jun;174:106211. doi: 10.1016/j.neunet.2024.106211. Epub 2024 Feb 27.

Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval.

Neural Netw. 2021 Feb;134:143-162. doi: 10.1016/j.neunet.2020.11.011. Epub 2020 Nov 28.

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval.

Sensors (Basel). 2023 Mar 24;23(7):3439. doi: 10.3390/s23073439.

Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval.

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):7997-8009. doi: 10.1109/TNNLS.2024.3421583. Epub 2025 May 2.

Joint Specifics and Consistency Hash Learning for Large-Scale Cross-Modal Retrieval.

IEEE Trans Image Process. 2022;31:5343-5358. doi: 10.1109/TIP.2022.3195059. Epub 2022 Aug 16.

Contrastive learning of graphs under label noise.

Neural Netw. 2024 Apr;172:106113. doi: 10.1016/j.neunet.2024.106113. Epub 2024 Jan 6.

Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.

Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.

Joint Feature Synthesis and Embedding: Adversarial Cross-Modal Retrieval Revisited.

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):3030-3047. doi: 10.1109/TPAMI.2020.3045530. Epub 2022 May 5.

MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval.

IEEE Trans Cybern. 2020 Mar;50(3):1047-1059. doi: 10.1109/TCYB.2018.2879846. Epub 2018 Dec 5.

Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献