Zou Qin, Cao Ling, Zhang Zheng, Chen Long, Wang Song
IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1673-1687. doi: 10.1109/TNNLS.2020.3043298. Epub 2022 Apr 4.
Hash coding has been widely used in the approximate nearest neighbor search for large-scale image retrieval. Given semantic annotations such as class labels and pairwise similarities of the training data, hashing methods can learn and generate effective and compact binary codes. While some newly introduced images may contain undefined semantic labels, which we call unseen images, zero-shot hashing (ZSH) techniques have been studied for retrieval. However, existing ZSH methods mainly focus on the retrieval of single-label images and cannot handle multilabel ones. In this article, for the first time, a novel transductive ZSH method is proposed for multilabel unseen image retrieval. In order to predict the labels of the unseen/target data, a visual-semantic bridge is built via instance-concept coherence ranking on the seen/source data. Then, pairwise similarity loss and focal quantization loss are constructed for training a hashing model using both the seen/source and unseen/target data. Extensive evaluations on three popular multilabel data sets demonstrate that the proposed hashing method achieves significantly better results than the comparison methods.
哈希编码已广泛应用于大规模图像检索的近似最近邻搜索。给定诸如训练数据的类别标签和成对相似度等语义标注,哈希方法可以学习并生成有效且紧凑的二进制代码。虽然一些新引入的图像可能包含未定义的语义标签,我们称之为未见图像,但零样本哈希(ZSH)技术已被研究用于检索。然而,现有的ZSH方法主要专注于单标签图像的检索,无法处理多标签图像。在本文中,首次提出了一种新颖的转导式ZSH方法用于多标签未见图像检索。为了预测未见/目标数据的标签,通过对可见/源数据进行实例-概念一致性排序来构建视觉-语义桥梁。然后,构建成对相似度损失和焦点量化损失,以使用可见/源数据和未见/目标数据训练哈希模型。在三个流行的多标签数据集上进行的广泛评估表明,所提出的哈希方法比比较方法取得了显著更好的结果。