Bosch Anna, Zisserman Andrew, Muñoz Xavier
Computer Vision and Robotics Group, Universitat de Girona, Campus Montilivi, Avenida Lluís Santaló s/n, Girona, Spain.
IEEE Trans Pattern Anal Mach Intell. 2008 Apr;30(4):712-27. doi: 10.1109/TPAMI.2007.70716.
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail we are given a set of labelled images of scenes (e.g. coast, forest, city, river, etc) and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent "topics" using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently training a multi-way classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly, and training a multi-way classifier on these vectors. To this end we introduce a novel vocabulary using dense colour SIFT descriptors, and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learnt, and the type of discriminative classifier used (k-nearest neighbour or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases using the authors' own datasets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos.
我们研究使用潜在生成模型进行降维是否有利于弱监督场景分类任务。具体来说,我们有一组带标签的场景图像(如海岸、森林、城市、河流等),我们的目标是将新图像分类到这些类别中的一个。我们的方法包括首先使用概率潜在语义分析(pLSA)发现潜在“主题”,pLSA是一种来自统计文本领域的生成模型,这里应用于每个图像的视觉词袋表示,随后针对每个图像的主题分布向量训练一个多类分类器。我们将这种方法与直接用视觉词袋向量表示每个图像并在这些向量上训练多类分类器的方法进行比较。为此,我们使用密集颜色SIFT描述符引入了一种新颖的词汇表,然后研究在视觉词汇表大小、学习到的潜在主题数量以及所使用的判别分类器类型(k近邻或支持向量机)变化的情况下的分类性能。在所有情况下,使用作者自己的数据集和测试协议,我们实现了比最近使用视觉词袋表示的出版物更好的分类性能。我们还研究了添加空间信息带来的增益。我们展示了其在具有相关反馈的图像检索以及视频场景分类中的应用。