Suppr超能文献

用于跨媒体检索的离散语义对齐哈希

Discrete Semantic Alignment Hashing for Cross-Media Retrieval.

作者信息

Yao Tao, Kong Xiangwei, Fu Haiyan, Tian Qi

出版信息

IEEE Trans Cybern. 2020 Dec;50(12):4896-4907. doi: 10.1109/TCYB.2019.2912644. Epub 2020 Dec 3.

Abstract

Cross-media hashing, which maps data from different modalities to a low-dimensional sharing Hamming space, has attracted considerable attention due to the rapid increase of multimodal data, for example, images and texts. Recent cross-media hashing works mainly aim at learning compact hash codes to preserve the class label-based or feature-based similarities among samples. However, these methods ignore the unbalanced semantic gaps between different modalities and high-level semantic concepts, which generally results in less effective hash functions and unsatisfying retrieval performance. Specifically, the key words of texts contain semantic meanings, while the low-level features of images lack of semantic meanings. That means the semantic gap in image modality is larger than that in text modality. In this paper, we propose a simple yet effective hashing method for cross-media retrieval to address this problem, dubbed discrete semantic alignment hashing (DSAH). First, DSAH formulates to exploit collaborative filtering to mine the relations between class labels and hash codes, which can reduce memory consumption and computational cost compared to pairwise similarity. Then, the attribute of image modality is employed to align the semantic information with text modality. Finally, to further improve the quality of hash codes, we propose a discrete optimization algorithm to learn discrete hash codes directly, and each bit has a closed-form solution. Extensive experiments on multiple public databases show that our model can seamlessly incorporate attributes and achieve promising performance.

摘要

跨媒体哈希将来自不同模态的数据映射到一个低维共享汉明空间,由于多模态数据(如图像和文本)的快速增长,其已引起了广泛关注。最近的跨媒体哈希工作主要旨在学习紧凑的哈希码,以保留样本之间基于类标签或基于特征的相似性。然而,这些方法忽略了不同模态和高级语义概念之间不平衡的语义差距,这通常会导致哈希函数效率较低且检索性能不尽人意。具体而言,文本的关键词包含语义含义,而图像的低级特征缺乏语义含义。这意味着图像模态中的语义差距大于文本模态中的语义差距。在本文中,我们提出了一种简单而有效的跨媒体检索哈希方法来解决这个问题,称为离散语义对齐哈希(DSAH)。首先,DSAH制定利用协同过滤来挖掘类标签和哈希码之间的关系,与成对相似性相比,这可以减少内存消耗和计算成本。然后,利用图像模态的属性将语义信息与文本模态对齐。最后,为了进一步提高哈希码的质量,我们提出了一种离散优化算法来直接学习离散哈希码,并且每个位都有一个闭式解。在多个公共数据库上进行的大量实验表明,我们的模型可以无缝整合属性并取得有前景的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验