Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA 30332-0250, USA.
IEEE Trans Image Process. 2010 Jul;19(7):1933-47. doi: 10.1109/TIP.2010.2045019. Epub 2010 Mar 8.
A new framework for content-based image retrieval, which takes advantage of the source characterization property of a universal source coding scheme, is investigated. Based upon a new class of multidimensional incremental parsing algorithm, extended from the Lempel-Ziv incremental parsing code, the proposed method captures the occurrence pattern of visual elements from a given image. A linguistic processing technique, namely the latent semantic analysis (LSA) method, is then employed to identify associative ensembles of visual elements, which lay the foundation for intelligent visual information analysis. In 2-D applications, incremental parsing decomposes an image into elementary patches that are different from the conventional fixed square-block type patches. When used in compressive representations, it is amenable in schemes that do not rely on average distortion criteria, a methodology that is a departure from the conventional vector quantization. We call this methodology a parsed representation. In this article, we present our implementations of an image retrieval system, called IPSILON, with parsed representations induced by different perceptual distortion thresholds. We evaluate the effectiveness of the use of the parsed representations by comparing their performance with that of four image retrieval systems, one using the conventional vector quantization for visual information analysis under the same LSA paradigm, another using a method called SIMPLIcity which is based upon an image segmentation and integrated region matching, and the other two based upon query-by-semantic-example and query-by-visual-example. The first two of them were tested with 20,000 images of natural scenes, and the others were tested with a portion of the images. The experimental results show that the proposed parsed representation efficiently captures the salient features in visual images and the IPSILON systems outperform other systems in terms of retrieval precision and distortion robustness.
一种基于通用信源编码方案源描述特性的基于内容的图像检索新框架得到了研究。基于一种新的多维增量解析算法类,从 Lempel-Ziv 增量解析码扩展而来,所提出的方法从给定的图像中捕获视觉元素的出现模式。然后,采用一种语言处理技术,即潜在语义分析 (LSA) 方法,用于识别视觉元素的联想集合,为智能视觉信息分析奠定基础。在 2-D 应用中,增量解析将图像分解为与传统固定正方形块类型补丁不同的基本补丁。当用于压缩表示时,它适用于不依赖于平均失真标准的方案,这是一种与传统矢量量化不同的方法。我们称之为解析表示。在本文中,我们提出了一个名为 IPSILON 的图像检索系统的实现,该系统使用不同感知失真阈值诱导的解析表示。我们通过将它们的性能与四个图像检索系统的性能进行比较,评估了使用解析表示的有效性,其中一个系统在相同的 LSA 范例下使用传统的矢量量化进行视觉信息分析,另一个系统使用基于图像分割和集成区域匹配的方法 SIMPLIcity,另外两个系统基于查询语义示例和查询视觉示例。前两个系统在 20000 张自然场景图像上进行了测试,其他系统在部分图像上进行了测试。实验结果表明,所提出的解析表示有效地捕获了视觉图像中的显著特征,并且 IPSILON 系统在检索精度和失真鲁棒性方面优于其他系统。