Department of Information Engineering, the Chinese University of Hong Kong, Shatin, Hong Kong.
IEEE Trans Pattern Anal Mach Intell. 2012 Jul;34(7):1342-53. doi: 10.1109/TPAMI.2011.242. Epub 2011 Dec 13.
Web-scale image search engines (e.g., Google image search, Bing image search) mostly rely on surrounding text features. It is difficult for them to interpret users' search intention only by query keywords and this leads to ambiguous and noisy search results which are far from satisfactory. It is important to use visual information in order to solve the ambiguity in text-based image retrieval. In this paper, we propose a novel Internet image search approach. It only requires the user to click on one query image with minimum effort and images from a pool retrieved by text-based search are reranked based on both visual and textual content. Our key contribution is to capture the users' search intention from this one-click query image in four steps. 1) The query image is categorized into one of the predefined adaptive weight categories which reflect users' search intention at a coarse level. Inside each category, a specific weight schema is used to combine visual features adaptive to this kind of image to better rerank the text-based search result. 2) Based on the visual content of the query image selected by the user and through image clustering, query keywords are expanded to capture user intention. 3) Expanded keywords are used to enlarge the image pool to contain more relevant images. 4) Expanded keywords are also used to expand the query image to multiple positive visual examples from which new query specific visual and textual similarity metrics are learned to further improve content-based image reranking. All these steps are automatic, without extra effort from the user. This is critically important for any commercial web-based image search engine, where the user interface has to be extremely simple. Besides this key contribution, a set of visual features which are both effective and efficient in Internet image search are designed. Experimental evaluation shows that our approach significantly improves the precision of top-ranked images and also the user experience.
网络规模的图像搜索引擎(例如,谷歌图片搜索,必应图片搜索)主要依赖于周围的文本特征。他们很难仅通过查询关键字来解释用户的搜索意图,这导致了模糊和嘈杂的搜索结果,远远不能令人满意。为了解决基于文本的图像检索中的歧义,使用视觉信息非常重要。在本文中,我们提出了一种新颖的互联网图像搜索方法。它只需要用户最少的努力点击一个查询图像,并且基于视觉和文本内容对基于文本搜索检索到的图像进行重新排序。我们的主要贡献是分四步从这一次点击查询图像中捕获用户的搜索意图。1)查询图像分为预定义的自适应权重类别之一,以粗粒度反映用户的搜索意图。在每个类别中,使用特定的权重模式来组合自适应于这种图像的视觉特征,以更好地重新排列基于文本的搜索结果。2)基于用户选择的查询图像的视觉内容,并通过图像聚类,扩展查询关键字以捕获用户意图。3)使用扩展的关键字来扩大图像池以包含更多相关的图像。4)扩展的关键字也用于扩展查询图像,以获得多个新的查询特定的视觉和文本相似性度量,从而进一步改进基于内容的图像重新排序。所有这些步骤都是自动的,不需要用户额外的努力。这对于任何基于网络的商业图像搜索引擎都是至关重要的,因为用户界面必须非常简单。除了这个主要贡献之外,还设计了一组在互联网图像搜索中既有效又高效的视觉特征。实验评估表明,我们的方法显著提高了顶级图像的精度,也提高了用户体验。