IEEE Trans Pattern Anal Mach Intell. 2011 Dec;33(12):2368-82. doi: 10.1109/TPAMI.2011.131. Epub 2011 Jun 30.
While there has been a lot of recent work on object recognition and image understanding, the focus has been on carefully establishing mathematical models for images, scenes, and objects. In this paper, we propose a novel, nonparametric approach for object recognition and scene parsing using a new technology we name label transfer. For an input image, our system first retrieves its nearest neighbors from a large database containing fully annotated images. Then, the system establishes dense correspondences between the input image and each of the nearest neighbors using the dense SIFT flow algorithm [28], which aligns two images based on local image structures. Finally, based on the dense scene correspondences obtained from SIFT flow, our system warps the existing annotations and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on challenging databases. Compared to existing object recognition approaches that require training classifiers or appearance models for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.
虽然最近在对象识别和图像理解方面已经有了很多工作,但重点一直是仔细为图像、场景和对象建立数学模型。在本文中,我们提出了一种新颖的、基于非参数的方法,用于使用我们称之为标签传输的新技术进行对象识别和场景解析。对于输入图像,我们的系统首先从包含完全注释图像的大型数据库中检索其最近邻。然后,系统使用密集 SIFT 流算法[28]在输入图像和每个最近邻之间建立密集对应关系,该算法基于局部图像结构对齐两幅图像。最后,根据从 SIFT 流获得的密集场景对应关系,我们的系统对现有注释进行变形,并在马尔可夫随机场框架中集成多个线索,以分割和识别查询图像。我们的非参数场景解析系统在具有挑战性的数据库上取得了令人鼓舞的实验结果。与需要为每个对象类别训练分类器或外观模型的现有对象识别方法相比,我们的系统易于实现,参数少,并在检索/对齐过程中自然地嵌入上下文信息。