IEEE Trans Image Process. 2018 Oct;27(10):4945-4957. doi: 10.1109/TIP.2018.2845120.
Deep convolutional neural networks (CNNs) have been widely and successfully applied in many computer vision tasks, such as classification, detection, semantic segmentation, and so on. As for image retrieval, while off-the-shelf CNN features from models trained for classification task are demonstrated promising, it remains a challenge to learn specific features oriented for instance retrieval. Witnessing the great success of low-level SIFT feature in image retrieval and its complementary nature to the semantic-aware CNN feature, in this paper, we propose to embed the SIFT feature into the CNN feature with a Siamese structure in a learning-based paradigm. The learning objective consists of two kinds of loss, i.e., similarity loss and fidelity loss. The first loss embeds the image-level nearest neighborhood structure with the SIFT feature into CNN feature learning, while the second loss imposes that the CNN feature with the updated CNN model preserves the fidelity of that from the original CNN model solely trained for classification. After the learning, the generated CNN feature inherits the property of the SIFT feature, which is well oriented for image retrieval. We evaluate our approach on the public data sets, and comprehensive experiments demonstrate the effectiveness of the proposed method.
深度卷积神经网络(CNNs)已经在许多计算机视觉任务中得到了广泛而成功的应用,例如分类、检测、语义分割等。对于图像检索,虽然来自分类任务训练的模型的现成 CNN 特征表现出了很大的潜力,但学习面向实例检索的特定特征仍然是一个挑战。鉴于低层 SIFT 特征在图像检索中的巨大成功及其与语义感知 CNN 特征的互补性,本文提出了一种基于学习的范例,将 SIFT 特征嵌入到具有暹罗结构的 CNN 特征中。学习目标由两种损失组成,即相似性损失和保真度损失。第一种损失将图像级最近邻结构与 SIFT 特征嵌入到 CNN 特征学习中,而第二种损失则要求使用更新的 CNN 模型的 CNN 特征保持仅针对分类训练的原始 CNN 模型的保真度。学习后,生成的 CNN 特征继承了 SIFT 特征的特性,这非常适合图像检索。我们在公共数据集上评估了我们的方法,综合实验证明了所提出方法的有效性。