Peng Liangkang, Qian Jiangbo, Xu Zhengtao, Xin Yu, Guo Lijun
IEEE Trans Image Process. 2023;32:1759-1773. doi: 10.1109/TIP.2023.3251028. Epub 2023 Mar 14.
Learning hash functions have been widely applied for large-scale image retrieval. Existing methods usually use CNNs to process an entire image at once, which is efficient for single-label images but not for multi-label images. First, these methods cannot fully exploit independent features of different objects in one image, resulting in some small object features with important information being ignored. Second, the methods cannot capture different semantic information from dependency relations among objects. Third, the existing methods ignore the impacts of imbalance between hard and easy training pairs, resulting in suboptimal hash codes. To address these issues, we propose a novel deep hashing method, termed multi-label hashing for dependency relations among multiple objectives (DRMH). We first utilize an object detection network to extract object feature representations to avoid ignoring small object features and then fuse object visual features with position features and further capture dependency relations among objects using a self-attention mechanism. In addition, we design a weighted pairwise hash loss to solve the imbalance problem between hard and easy training pairs. Extensive experiments are conducted on multi-label datasets and zero-shot datasets, and the proposed DRMH outperforms many state-of-the-art hashing methods with respect to different evaluation metrics.
学习哈希函数已被广泛应用于大规模图像检索。现有方法通常使用卷积神经网络(CNNs)一次性处理整个图像,这对于单标签图像效率较高,但对于多标签图像则不然。首先,这些方法无法充分利用一幅图像中不同对象的独立特征,导致一些具有重要信息的小对象特征被忽略。其次,这些方法无法从对象之间的依赖关系中捕获不同的语义信息。第三,现有方法忽略了难易训练对之间不平衡的影响,导致哈希码次优。为了解决这些问题,我们提出了一种新颖的深度哈希方法,称为多目标依赖关系多标签哈希(DRMH)。我们首先利用目标检测网络提取目标特征表示,以避免忽略小目标特征,然后将目标视觉特征与位置特征融合,并使用自注意力机制进一步捕获目标之间的依赖关系。此外,我们设计了一种加权成对哈希损失来解决难易训练对之间的不平衡问题。我们在多标签数据集和零样本数据集上进行了广泛的实验,并且所提出的DRMH在不同的评估指标方面优于许多现有的哈希方法。