IEEE Trans Image Process. 2018 Sep;27(9):4490-4502. doi: 10.1109/TIP.2018.2839522.
Learning-based hashing is a leading approach of approximate nearest neighbor search for large-scale image retrieval. In this paper, we develop a deep supervised hashing method for multi-label image retrieval, in which we propose to learn a binary "mask" map that can identify the approximate locations of objects in an image, so that we use this binary "mask" map to obtain length-limited hash codes which mainly focus on an image's objects but ignore the background. The proposed deep architecture consists of four parts: 1) a convolutional sub-network to generate effective image features; 2) a binary "mask" sub-network to identify image objects' approximate locations; 3) a weighted average pooling operation based on the binary "mask" to obtain feature representations and hash codes that pay most attention to foreground objects but ignore the background; and 4) the combination of a triplet ranking loss designed to preserve relative similarities among images and a cross entropy loss defined on image labels. We conduct comprehensive evaluations on four multi-label image data sets. The results indicate that the proposed hashing method achieves superior performance gains over the state-of-the-art supervised or unsupervised hashing baselines.
基于学习的哈希是大规模图像检索的近似最近邻搜索的主要方法。在本文中,我们为多标签图像检索开发了一种深度监督哈希方法,其中我们提出学习一个二进制“掩码”图,可以识别图像中物体的近似位置,从而使用这个二进制“掩码”图来获得长度受限的哈希码,这些哈希码主要集中在图像的物体上,而忽略背景。所提出的深度架构由四部分组成:1)一个卷积子网,用于生成有效的图像特征;2)一个二进制“掩码”子网,用于识别图像对象的近似位置;3)基于二进制“掩码”的加权平均池化操作,以获得最关注前景对象但忽略背景的特征表示和哈希码;4)设计用于保留图像之间相对相似性的三元组排序损失和定义在图像标签上的交叉熵损失的组合。我们在四个多标签图像数据集上进行了全面评估。结果表明,所提出的哈希方法在先进的监督或无监督哈希基准上实现了卓越的性能提升。