Department of Computer Science, Harbin Institute of Technology, Harbin 150001, China.
IEEE Trans Image Process. 2012 Apr;21(4):2282-93. doi: 10.1109/TIP.2011.2176950. Epub 2011 Nov 22.
A visual codebook serves as a fundamental component in many state-of-the-art computer vision systems. Most existing codebooks are built based on quantizing local feature descriptors extracted from training images. Subsequently, each image is represented as a high-dimensional bag-of-words histogram. Such highly redundant image description lacks efficiency in both storage and retrieval, in which only a few bins are nonzero and distributed sparsely. Furthermore, most existing codebooks are built based solely on the visual statistics of local descriptors, without considering the supervise labels coming from the subsequent recognition or classification tasks. In this paper, we propose a task-dependent codebook compression framework to handle the above two problems. First, we propose to learn a compression function to map an originally high-dimensional codebook into a compact codebook while maintaining its visual discriminability. This is achieved by a codeword sparse coding scheme with Lasso regression, which minimizes the descriptor distortions of training images after codebook compression. Second, we propose to adapt our codebook compression to the subsequent recognition or classification tasks. This is achieved by introducing a label constraint kernel (LCK) into our compression loss function. In particular, our LCK can model heterogeneous kinds of supervision, i.e., (partial) category labels, correlative semantic annotations, and image query logs. We validated our codebook compression in three computer vision tasks: 1) object recognition in PASCAL Visual Object Class 07; 2) near-duplicate image retrieval in UKBench; and 3) web image search in a collection of 0.5 million Flickr photographs. Our compressed codebook has shown superior performances over several state-of-the-art supervised and unsupervised codebooks.
视觉代码本是许多最先进的计算机视觉系统的基本组成部分。大多数现有的代码本都是基于对从训练图像中提取的局部特征描述符进行量化构建的。然后,每个图像都表示为一个高维的词袋直方图。这种高度冗余的图像描述在存储和检索方面效率低下,其中只有少数几个 bin 是非零的,并且分布稀疏。此外,大多数现有的代码本都是仅基于局部描述符的视觉统计信息构建的,而没有考虑来自后续识别或分类任务的监督标签。在本文中,我们提出了一种任务相关的代码本压缩框架来处理上述两个问题。首先,我们提出学习一种压缩函数,将原始的高维代码本映射到紧凑的代码本,同时保持其视觉可区分性。这是通过具有 Lasso 回归的码字稀疏编码方案实现的,该方案最小化了代码本压缩后训练图像的描述符失真。其次,我们提出将我们的代码本压缩适配到后续的识别或分类任务。这是通过在我们的压缩损失函数中引入标签约束核 (LCK) 实现的。具体来说,我们的 LCK 可以对各种监督信息进行建模,例如(部分)类别标签、相关语义注释和图像查询日志。我们在三个计算机视觉任务中验证了我们的代码本压缩:1)PASCAL 视觉对象类别 07 中的目标识别;2)UKBench 中的近似重复图像检索;3)在包含 50 万张 Flickr 照片的集合中的网络图像搜索。我们的压缩代码本在几个最先进的有监督和无监督代码本上都表现出了优异的性能。