Google Los Angeles (US-LAX-BIN), 340 Main Street, Venice, CA 90291, USA.
IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1842-55. doi: 10.1109/TPAMI.2011.268.
Which one comes first: segmentation or recognition? We propose a unified framework for carrying out the two simultaneously and without supervision. The framework combines a flexible probabilistic model, for representing the shape and appearance of each segment, with the popular “bag of visual words” model for recognition. If applied to a collection of images, our framework can simultaneously discover the segments of each image and the correspondence between such segments, without supervision. Such recurring segments may be thought of as the “parts” of corresponding objects that appear multiple times in the image collection. Thus, the model may be used for learning new categories, detecting/classifying objects, and segmenting images, without using expensive human annotation.
分割还是识别?我们提出了一个统一的框架来同时进行这两个任务,而且无需监督。该框架结合了一个灵活的概率模型,用于表示每个分割的形状和外观,以及流行的“视觉词汇袋”模型用于识别。如果应用于图像集合,我们的框架可以同时发现每个图像的分割以及这些分割之间的对应关系,无需监督。这些重复出现的分割可以被认为是对应对象的“部分”,它们在图像集合中多次出现。因此,该模型可用于学习新类别、检测/分类对象和分割图像,而无需使用昂贵的人工注释。