Suppr超能文献

内部显著度:为基于内容的图像检索学习注意力卷积神经网络

Saliency Inside: Learning Attentive CNNs for Content-based Image Retrieval.

作者信息

Wei Shikui, Liao Lixin, Li Jia, Zheng Qinjie, Yang Fei, Zhao Yao

出版信息

IEEE Trans Image Process. 2019 May 2. doi: 10.1109/TIP.2019.2913513.

Abstract

In content-based image retrieval (CBIR), one of the most challenging and ambiguous tasks are to correctly understand the human query intention and measure its semantic relevance with images in the database. Due to the impressive capability of visual saliency in predicting human visual attention that is closely related to the query intention, this paper attempts to explicitly discover the essential effect of visual saliency in CBIR via qualitative and quantitative experiments. Toward this end, we first generate the fixation density maps of images from a widely used CBIR dataset by using an eye-tracking apparatus. These ground-truth saliency maps are then used to measure the influence of visual saliency to the task of CBIR by exploring several probable ways of incorporating such saliency cues into the retrieval process. We find that visual saliency is indeed beneficial to the CBIR task, and the best saliency involving scheme is possibly different for different image retrieval models. Inspired by the findings, this paper presents two-stream attentive CNNs with saliency embedded inside for CBIR. The proposed network has two streams that simultaneously handle two tasks. The main stream focuses on extracting discriminative visual features that are tightly related to semantic attributes. Meanwhile, the auxiliary stream aims to facilitate the main stream by redirecting the feature extraction to the salient image content that human may pay attention to. By fusing these two streams into the Main and Auxiliary CNNs (MAC), image similarity can be computed as the human being does by reserving conspicuous content and suppressing irrelevant regions. Extensive experiments show that the proposed model achieves impressive performance in image retrieval on four public datasets.

摘要

在基于内容的图像检索(CBIR)中,最具挑战性和模糊性的任务之一是正确理解人类查询意图,并衡量其与数据库中图像的语义相关性。由于视觉显著性在预测与查询意图密切相关的人类视觉注意力方面具有令人印象深刻的能力,本文试图通过定性和定量实验明确发现视觉显著性在CBIR中的重要作用。为此,我们首先使用眼动追踪设备从一个广泛使用的CBIR数据集中生成图像的注视密度图。然后,通过探索将此类显著性线索纳入检索过程的几种可能方式,利用这些真实的显著性图来衡量视觉显著性对CBIR任务的影响。我们发现视觉显著性确实对CBIR任务有益,并且对于不同的图像检索模型,最佳的显著性融入方案可能不同。受这些发现的启发,本文提出了用于CBIR的嵌入显著性的双流注意力卷积神经网络。所提出的网络有两个流,同时处理两项任务。主流专注于提取与语义属性紧密相关的判别性视觉特征。同时,辅助流旨在通过将特征提取重定向到人类可能关注的显著图像内容来促进主流。通过将这两个流融合到主辅助卷积神经网络(MAC)中,可以像人类一样通过保留显著内容并抑制无关区域来计算图像相似度。大量实验表明,所提出的模型在四个公共数据集上的图像检索中取得了令人印象深刻的性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验