Ji Zhong, Sun Yuxin, Yu Yunlong, Pang Yanwei, Han Jungong
IEEE Trans Neural Netw Learn Syst. 2020 Jan;31(1):321-330. doi: 10.1109/TNNLS.2019.2904991. Epub 2019 Apr 11.
Zero-shot hashing (ZSH) aims at learning a hashing model that is trained only by instances from seen categories but can generate well to those of unseen categories. Typically, it is achieved by utilizing a semantic embedding space to transfer knowledge from seen domain to unseen domain. Existing efforts mainly focus on single-modal retrieval task, especially image-based image retrieval (IBIR). However, as a highlighted research topic in the field of hashing, cross-modal retrieval is more common in real-world applications. To address the cross-modal ZSH (CMZSH) retrieval task, we propose a novel attribute-guided network (AgNet), which can perform not only IBIR but also text-based image retrieval (TBIR). In particular, AgNet aligns different modal data into a semantically rich attribute space, which bridges the gap caused by modality heterogeneity and zero-shot setting. We also design an effective strategy that exploits the attribute to guide the generation of hash codes for image and text within the same network. Extensive experimental results on three benchmark data sets (AwA, SUN, and ImageNet) demonstrate the superiority of AgNet on both cross-modal and single-modal zero-shot image retrieval tasks.
零样本哈希(ZSH)旨在学习一种哈希模型,该模型仅通过来自已见类别的实例进行训练,但能够很好地生成未见类别的实例的哈希值。通常,这是通过利用语义嵌入空间将知识从已见领域转移到未见领域来实现的。现有工作主要集中在单模态检索任务上,特别是基于图像的图像检索(IBIR)。然而,作为哈希领域的一个突出研究课题,跨模态检索在实际应用中更为常见。为了解决跨模态ZSH(CMZSH)检索任务,我们提出了一种新颖的属性引导网络(AgNet),它不仅可以执行IBIR,还可以执行基于文本的图像检索(TBIR)。具体而言,AgNet将不同模态的数据对齐到一个语义丰富的属性空间中,该空间弥合了由模态异质性和零样本设置所造成的差距。我们还设计了一种有效的策略,利用该属性在同一网络中指导图像和文本的哈希码生成。在三个基准数据集(AwA、SUN和ImageNet)上进行的大量实验结果证明了AgNet在跨模态和单模态零样本图像检索任务上的优越性。