Suppr超能文献

用于无监督跨模态检索的对象级视觉-文本关联图哈希

Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval.

作者信息

Shi Ge, Li Feng, Wu Lifang, Chen Yukun

机构信息

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.

出版信息

Sensors (Basel). 2022 Apr 11;22(8):2921. doi: 10.3390/s22082921.

Abstract

The core of cross-modal hashing methods is to map high dimensional features into binary hash codes, which can then efficiently utilize the Hamming distance metric to enhance retrieval efficiency. Recent development emphasizes the advantages of the unsupervised cross-modal hashing technique, since it only relies on relevant information of the paired data, making it more applicable to real-world applications. However, two problems, that is intro-modality correlation and inter-modality correlation, still have not been fully considered. Intra-modality correlation describes the complex overall concept of a single modality and provides semantic relevance for retrieval tasks, while inter-modality correction refers to the relationship between different modalities. From our observation and hypothesis, the dependency relationship within the modality and between different modalities can be constructed at the object level, which can further improve cross-modal hashing retrieval accuracy. To this end, we propose a Visual-textful Correlation Graph Hashing (OVCGH) approach to mine the fine-grained object-level similarity in cross-modal data while suppressing noise interference. Specifically, a novel intra-modality correlation graph is designed to learn graph-level representations of different modalities, obtaining the dependency relationship of the image region to image region and the tag to tag in an unsupervised manner. Then, we design a visual-text dependency building module that can capture correlation semantic information between different modalities by modeling the dependency relationship between image object region and text tag. Extensive experiments on two widely used datasets verify the effectiveness of our proposed approach.

摘要

跨模态哈希方法的核心是将高维特征映射为二进制哈希码,进而能够高效利用汉明距离度量来提高检索效率。近期的发展强调了无监督跨模态哈希技术的优势,因为它仅依赖于配对数据的相关信息,使其更适用于实际应用。然而,两个问题,即模态内相关性和模态间相关性仍未得到充分考虑。模态内相关性描述了单一模态的复杂整体概念,并为检索任务提供语义相关性,而模态间相关性则指不同模态之间的关系。根据我们的观察和假设,可以在对象层面构建模态内和不同模态之间的依赖关系,这能够进一步提高跨模态哈希检索的准确性。为此,我们提出了一种视觉文本相关图哈希(OVCGH)方法,以挖掘跨模态数据中细粒度的对象级相似性,同时抑制噪声干扰。具体而言,设计了一种新颖的模态内相关图,用于学习不同模态的图级表示,以无监督方式获得图像区域与图像区域之间以及标签与标签之间的依赖关系。然后,我们设计了一个视觉文本依赖构建模块,通过对图像对象区域和文本标签之间的依赖关系进行建模,来捕获不同模态之间的相关语义信息。在两个广泛使用的数据集上进行的大量实验验证了我们所提方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26c9/9029824/9c462fda4197/sensors-22-02921-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验