Department of EEIS, University of Science and Technology of China, Hefei, China.
IEEE Trans Image Process. 2010 Jul;19(7):1908-20. doi: 10.1109/TIP.2010.2045169. Epub 2010 Mar 11.
The Bag-of-Words (BoW) model is a promising image representation technique for image categorization and annotation tasks. One critical limitation of existing BoW models is that much semantic information is lost during the codebook generation process, an important step of BoW. This is because the codebook generated by BoW is often obtained via building the codebook simply by clustering visual features in Euclidian space. However, visual features related to the same semantics may not distribute in clusters in the Euclidian space, which is primarily due to the semantic gap between low-level features and high-level semantics. In this paper, we propose a novel scheme to learn optimized BoW models, which aims to map semantically related features to the same visual words. In particular, we consider the distance between semantically identical features as a measurement of the semantic gap, and attempt to learn an optimized codebook by minimizing this gap, aiming to achieve the minimal loss of the semantics. We refer to such kind of novel codebook as semantics-preserving codebook (SPC) and the corresponding model as the Semantics-Preserving Bag-of-Words (SPBoW) model. Extensive experiments on image annotation and object detection tasks with public testbeds from MIT's Labelme and PASCAL VOC challenge databases show that the proposed SPC learning scheme is effective for optimizing the codebook generation process, and the SPBoW model is able to greatly enhance the performance of the existing BoW model.
词袋(BoW)模型是一种很有前途的图像表示技术,可用于图像分类和标注任务。现有的 BoW 模型存在一个关键的局限性,即在 BoW 的代码本生成过程中会丢失大量语义信息。这是因为 BoW 生成的代码本通常是通过在欧几里得空间中聚类视觉特征来简单地构建代码本得到的。然而,与相同语义相关的视觉特征可能不会在欧几里得空间的聚类中分布,这主要是由于低层次特征和高层次语义之间存在语义差距。在本文中,我们提出了一种新的方案来学习优化的 BoW 模型,旨在将语义相关的特征映射到相同的视觉词汇上。具体来说,我们将语义相同的特征之间的距离视为语义差距的度量,并尝试通过最小化这种差距来学习优化的代码本,以实现语义的最小损失。我们将这种新的代码本称为语义保留代码本(SPC),并将相应的模型称为语义保留词袋(SPBoW)模型。在使用 MIT 的 Labelme 和 PASCAL VOC 挑战数据库的公共测试集进行的图像标注和目标检测任务的广泛实验中,表明所提出的 SPC 学习方案能够有效地优化代码本生成过程,并且 SPBoW 模型能够极大地提高现有的 BoW 模型的性能。