任务相关的视觉码本压缩。

Task-dependent visual-codebook compression.

机构信息

Department of Computer Science, Harbin Institute of Technology, Harbin 150001, China.

出版信息

IEEE Trans Image Process. 2012 Apr;21(4):2282-93. doi: 10.1109/TIP.2011.2176950. Epub 2011 Nov 22.

DOI:10.1109/TIP.2011.2176950

Abstract

A visual codebook serves as a fundamental component in many state-of-the-art computer vision systems. Most existing codebooks are built based on quantizing local feature descriptors extracted from training images. Subsequently, each image is represented as a high-dimensional bag-of-words histogram. Such highly redundant image description lacks efficiency in both storage and retrieval, in which only a few bins are nonzero and distributed sparsely. Furthermore, most existing codebooks are built based solely on the visual statistics of local descriptors, without considering the supervise labels coming from the subsequent recognition or classification tasks. In this paper, we propose a task-dependent codebook compression framework to handle the above two problems. First, we propose to learn a compression function to map an originally high-dimensional codebook into a compact codebook while maintaining its visual discriminability. This is achieved by a codeword sparse coding scheme with Lasso regression, which minimizes the descriptor distortions of training images after codebook compression. Second, we propose to adapt our codebook compression to the subsequent recognition or classification tasks. This is achieved by introducing a label constraint kernel (LCK) into our compression loss function. In particular, our LCK can model heterogeneous kinds of supervision, i.e., (partial) category labels, correlative semantic annotations, and image query logs. We validated our codebook compression in three computer vision tasks: 1) object recognition in PASCAL Visual Object Class 07; 2) near-duplicate image retrieval in UKBench; and 3) web image search in a collection of 0.5 million Flickr photographs. Our compressed codebook has shown superior performances over several state-of-the-art supervised and unsupervised codebooks.

摘要

视觉代码本是许多最先进的计算机视觉系统的基本组成部分。大多数现有的代码本都是基于对从训练图像中提取的局部特征描述符进行量化构建的。然后，每个图像都表示为一个高维的词袋直方图。这种高度冗余的图像描述在存储和检索方面效率低下，其中只有少数几个 bin 是非零的，并且分布稀疏。此外，大多数现有的代码本都是仅基于局部描述符的视觉统计信息构建的，而没有考虑来自后续识别或分类任务的监督标签。在本文中，我们提出了一种任务相关的代码本压缩框架来处理上述两个问题。首先，我们提出学习一种压缩函数，将原始的高维代码本映射到紧凑的代码本，同时保持其视觉可区分性。这是通过具有 Lasso 回归的码字稀疏编码方案实现的，该方案最小化了代码本压缩后训练图像的描述符失真。其次，我们提出将我们的代码本压缩适配到后续的识别或分类任务。这是通过在我们的压缩损失函数中引入标签约束核 (LCK) 实现的。具体来说，我们的 LCK 可以对各种监督信息进行建模，例如（部分）类别标签、相关语义注释和图像查询日志。我们在三个计算机视觉任务中验证了我们的代码本压缩：1）PASCAL 视觉对象类别 07 中的目标识别；2）UKBench 中的近似重复图像检索；3）在包含 50 万张 Flickr 照片的集合中的网络图像搜索。我们的压缩代码本在几个最先进的有监督和无监督代码本上都表现出了优异的性能。

相似文献

Task-dependent visual-codebook compression.

IEEE Trans Image Process. 2012 Apr;21(4):2282-93. doi: 10.1109/TIP.2011.2176950. Epub 2011 Nov 22.

USB: ultrashort binary descriptor for fast visual matching and retrieval.

IEEE Trans Image Process. 2014 Aug;23(8):3671-83. doi: 10.1109/TIP.2014.2330794. Epub 2014 Jun 12.

Supervised learning of quantizer codebooks by information loss minimization.

IEEE Trans Pattern Anal Mach Intell. 2009 Jul;31(7):1294-309. doi: 10.1109/TPAMI.2008.138.

A statistical framework for image category search from a mental picture.

IEEE Trans Pattern Anal Mach Intell. 2009 Jun;31(6):1087-101. doi: 10.1109/TPAMI.2008.259.

Learning semantic and visual similarity for endomicroscopy video retrieval.

IEEE Trans Med Imaging. 2012 Jun;31(6):1276-88. doi: 10.1109/TMI.2012.2188301. Epub 2012 Feb 16.

Content based image retrieval using unclean positive examples.

IEEE Trans Image Process. 2009 Oct;18(10):2370-5. doi: 10.1109/TIP.2009.2026669. Epub 2009 Jul 6.

Medical image retrieval with probabilistic multi-class support vector machine classifiers and adaptive similarity fusion.

Comput Med Imaging Graph. 2008 Mar;32(2):95-108. doi: 10.1016/j.compmedimag.2007.10.001. Epub 2007 Nov 26.

Perceptually lossless medical image coding.

IEEE Trans Med Imaging. 2006 Mar;25(3):335-44. doi: 10.1109/TMI.2006.870483.

Learning image similarity from Flickr groups using fast kernel machines.

IEEE Trans Pattern Anal Mach Intell. 2012 Nov;34(11):2177-88. doi: 10.1109/TPAMI.2012.29.

Conjunctive patches subspace learning with side information for collaborative image retrieval.

IEEE Trans Image Process. 2012 Aug;21(8):3707-20. doi: 10.1109/TIP.2012.2195014. Epub 2012 Apr 17.

引用本文的文献

An Automatic Classification Method on Chronic Venous Insufficiency Images.

Sci Rep. 2018 Dec 18;8(1):17952. doi: 10.1038/s41598-018-36284-5.

An Efficient Augmented Lagrangian Method for Statistical X-Ray CT Image Reconstruction.

PLoS One. 2015 Oct 23;10(10):e0140579. doi: 10.1371/journal.pone.0140579. eCollection 2015.

Robust Optical Recognition of Cursive Pashto Script Using Scale, Rotation and Location Invariant Approach.

PLoS One. 2015 Sep 14;10(9):e0133648. doi: 10.1371/journal.pone.0133648. eCollection 2015.

A Probabilistic Analysis of Sparse Coded Feature Pooling and Its Application for Image Retrieval.

PLoS One. 2015 Jul 1;10(7):e0131721. doi: 10.1371/journal.pone.0131721. eCollection 2015.

Remote safety monitoring for elderly persons based on omni-vision analysis.

PLoS One. 2015 May 15;10(5):e0124068. doi: 10.1371/journal.pone.0124068. eCollection 2015.

A time-critical adaptive approach for visualizing natural scenes on different devices.

PLoS One. 2015 Feb 27;10(2):e0117586. doi: 10.1371/journal.pone.0117586. eCollection 2015.

Parameter estimation of fractional-order chaotic systems by using quantum parallel particle swarm optimization algorithm.

PLoS One. 2015 Jan 20;10(1):e0114910. doi: 10.1371/journal.pone.0114910. eCollection 2015.

A combined approach to cartographic displacement for buildings based on skeleton and improved elastic beam algorithm.

PLoS One. 2014 Dec 3;9(12):e113953. doi: 10.1371/journal.pone.0113953. eCollection 2014.

A lightweight distributed framework for computational offloading in mobile cloud computing.

PLoS One. 2014 Aug 15;9(8):e102270. doi: 10.1371/journal.pone.0102270. eCollection 2014.

On-device mobile visual location recognition by using panoramic images and compressed sensing based visual descriptors.

PLoS One. 2014 Jun 3;9(6):e98806. doi: 10.1371/journal.pone.0098806. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

任务相关的视觉码本压缩。

Task-dependent visual-codebook compression.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献