基于 CLIP 的自适应图注意力网络的大规模无监督多模态哈希检索

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval.

机构信息

School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China.

School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China.

出版信息

Sensors (Basel). 2023 Mar 24;23(7):3439. doi: 10.3390/s23073439.

DOI:10.3390/s23073439

PMID:37050499

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10099083/

Abstract

With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks.

摘要

随着各种传感器生成的多模态数据的激增，由于其在存储、检索效率和标签独立性方面的优势，无监督多模态哈希检索得到了广泛的研究。然而，现有的无监督方法仍然存在两个障碍：（1）由于现有方法无法充分捕获多模态数据的互补和共同出现信息，因此现有方法的相似度衡量不准确。（2）现有方法存在多模态学习不平衡和数据语义结构在哈希码二值化过程中被破坏的问题。为了解决这些障碍，我们设计了一种有效的基于 CLIP 的自适应图注意网络（CAGAN），用于大规模无监督多模态哈希检索。首先，我们使用多模态模型 CLIP 提取细粒度的语义特征，从多模态数据的不同角度挖掘相似信息，并进行相似度融合和增强。此外，本文提出了一种自适应图注意网络来辅助哈希码的学习，它使用注意力机制来学习跨模态的自适应图相似度。它通过图卷积网络进一步聚合邻近数据节点的内在邻域信息，生成更具判别力的哈希码。最后，本文采用迭代近似优化策略来减轻二值化过程中的信息损失。在三个基准数据集上的广泛实验表明，所提出的方法在无监督多模态检索任务中明显优于几种有代表性的哈希方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7b4/10099083/82bc66ad8bae/sensors-23-03439-g001.jpg

相似文献

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval.

Sensors (Basel). 2023 Mar 24;23(7):3439. doi: 10.3390/s23073439.

Structure-aware contrastive hashing for unsupervised cross-modal retrieval.

Neural Netw. 2024 Jun;174:106211. doi: 10.1016/j.neunet.2024.106211. Epub 2024 Feb 27.

Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval.

Sensors (Basel). 2022 Apr 11;22(8):2921. doi: 10.3390/s22082921.

Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model.

Comput Intell Neurosci. 2021 Jul 17;2021:5107034. doi: 10.1155/2021/5107034. eCollection 2021.

Triplet-Based Deep Hashing Network for Cross-Modal Retrieval.

IEEE Trans Image Process. 2018 Aug;27(8):3893-3903. doi: 10.1109/TIP.2018.2821921. Epub 2018 Apr 4.

Efficient Semi-Supervised Multimodal Hashing With Importance Differentiation Regression.

IEEE Trans Image Process. 2022;31:5881-5892. doi: 10.1109/TIP.2022.3203216. Epub 2022 Sep 13.

Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.

IEEE Trans Image Process. 2018;27(1):106-120. doi: 10.1109/TIP.2017.2755766.

Multi-Manifold Deep Discriminative Cross-Modal Hashing for Medical Image Retrieval.

IEEE Trans Image Process. 2022;31:3371-3385. doi: 10.1109/TIP.2022.3171081. Epub 2022 May 9.

Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval.

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):7997-8009. doi: 10.1109/TNNLS.2024.3421583. Epub 2025 May 2.

Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search.

IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1429-1440. doi: 10.1109/TNNLS.2018.2869601. Epub 2018 Oct 1.

本文引用的文献

Deep parameter-free attention hashing for image retrieval.

Sci Rep. 2022 Apr 30;12(1):7082. doi: 10.1038/s41598-022-11217-5.

Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing.

IEEE Trans Image Process. 2016 Nov;25(11):5427-5440. doi: 10.1109/TIP.2016.2607421. Epub 2016 Sep 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 CLIP 的自适应图注意力网络的大规模无监督多模态哈希检索

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval.

机构信息

School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China.

School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China.

出版信息

Sensors (Basel). 2023 Mar 24;23(7):3439. doi: 10.3390/s23073439.

DOI:10.3390/s23073439

PMID:37050499

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10099083/

Abstract

摘要

基于 CLIP 的自适应图注意力网络的大规模无监督多模态哈希检索

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于 CLIP 的自适应图注意力网络的大规模无监督多模态哈希检索

CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval.

机构信息

出版信息

相似文献

本文引用的文献