IEEE Trans Image Process. 2015 Dec;24(12):4766-79. doi: 10.1109/TIP.2015.2467315. Epub 2015 Aug 11.
Extracting informative image features and learning effective approximate hashing functions are two crucial steps in image retrieval. Conventional methods often study these two steps separately, e.g., learning hash functions from a predefined hand-crafted feature space. Meanwhile, the bit lengths of output hashing codes are preset in the most previous methods, neglecting the significance level of different bits and restricting their practical flexibility. To address these issues, we propose a supervised learning framework to generate compact and bit-scalable hashing codes directly from raw images. We pose hashing learning as a problem of regularized similarity learning. In particular, we organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, we maximize the margin between the matched pairs and the mismatched pairs in the Hamming space. In addition, a regularization term is introduced to enforce the adjacency consistency, i.e., images of similar appearances should have similar codes. The deep convolutional neural network is utilized to train the model in an end-to-end fashion, where discriminative image features and hash functions are simultaneously optimized. Furthermore, each bit of our hashing codes is unequally weighted, so that we can manipulate the code lengths by truncating the insignificant bits. Our framework outperforms state-of-the-arts on public benchmarks of similar image search and also achieves promising results in the application of person re-identification in surveillance. It is also shown that the generated bit-scalable hashing codes well preserve the discriminative powers with shorter code lengths.
提取信息丰富的图像特征和学习有效的近似哈希函数是图像检索中的两个关键步骤。传统方法通常分别研究这两个步骤,例如,从预定义的手工制作的特征空间学习哈希函数。同时,在之前的大多数方法中,输出哈希码的位数是预设的,忽略了不同位的重要性水平,并限制了它们的实际灵活性。为了解决这些问题,我们提出了一种有监督的学习框架,可以直接从原始图像生成紧凑且位可扩展的哈希码。我们将哈希学习问题表述为正则化相似性学习问题。具体来说,我们将训练图像组织成一批三元组样本,每个样本包含两个具有相同标签的图像和一个具有不同标签的图像。对于这些三元组样本,我们最大化汉明空间中匹配对和不匹配对之间的边界。此外,引入正则化项来强制保持邻接一致性,即具有相似外观的图像应具有相似的代码。我们利用深度卷积神经网络以端到端的方式训练模型,同时优化有区分力的图像特征和哈希函数。此外,我们的哈希码的每一位都被赋予不同的权重,因此我们可以通过截断不重要的位来控制码长。我们的框架在相似图像搜索的公共基准上优于最先进的方法,并且在监控中的人员重新识别应用中也取得了有希望的结果。实验结果还表明,生成的位可扩展哈希码在较短的码长下很好地保持了区分能力。