Suppr超能文献

通过挖掘图像搜索结果来标注图像。

Annotating images by mining image search results.

作者信息

Wang Xin-Jing, Zhang Lei, Li Xirong, Ma Wei-Ying

机构信息

Microsoft Research Asia, 4F Sigma Center, 49 Zhichun Road, Haidan District, Beijing 100190, PR China.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2008 Nov;30(11):1919-32. doi: 10.1109/TPAMI.2008.127.

Abstract

Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged-one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.

摘要

尽管计算机视觉和机器学习领域已经对图像标注进行了多年研究,但它仍远未达到实用阶段。在本文中,我们提出了一种无模型图像标注的全新尝试,这是一种通过挖掘图像搜索结果来对图像进行标注的数据驱动方法。我们从一些照片论坛收集了约240万张带有周边文本的图像来支持这种方法。整个过程被构建在一个分治框架中,其中在提供未加标题的图像时会给出一个查询关键词,以提高有效性和效率。当收集的数据集并非处处密集时,这很有帮助。从这个意义上讲,我们的方法包含三个步骤:1)搜索过程,以发现视觉和语义上相似的搜索结果;2)挖掘过程,从搜索结果的文本描述中识别显著术语;3)标注筛选过程,以过滤掉步骤2产生的噪声术语。为确保实时标注,我们利用了两项关键技术——一是将高维图像视觉特征映射为哈希码,另一是将其实现为分布式系统,其中搜索和挖掘过程作为网络服务提供。作为一个典型结果,整个过程在不到1秒内完成。由于无需训练数据集,我们的方法能够使用无限词汇进行标注,并且具有高度可扩展性且对异常值具有鲁棒性。在真实网络图像和基准图像数据集上的实验结果表明了所提算法的有效性和效率。还值得注意的是,尽管整个方法是在分治框架内阐述的,但查询关键词对我们当前的实现并非至关重要。我们提供了实验结果来证明这一点。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验