Suppr超能文献

一种新的多分辨率时空显著检测模型及其在图像和视频压缩中的应用。

A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression.

机构信息

Department of Electronic Engineering, Fudan University, Shanghai, 200433, China.

出版信息

IEEE Trans Image Process. 2010 Jan;19(1):185-98. doi: 10.1109/TIP.2009.2030969.

Abstract

Salient areas in natural scenes are generally regarded as areas which the human eye will typically focus on, and finding these areas is the key step in object detection. In computer vision, many models have been proposed to simulate the behavior of eyes such as SaliencyToolBox (STB), Neuromorphic Vision Toolkit (NVT), and others, but they demand high computational cost and computing useful results mostly relies on their choice of parameters. Although some region-based approaches were proposed to reduce the computational complexity of feature maps, these approaches still were not able to work in real time. Recently, a simple and fast approach called spectral residual (SR) was proposed, which uses the SR of the amplitude spectrum to calculate the image's saliency map. However, in our previous work, we pointed out that it is the phase spectrum, not the amplitude spectrum, of an image's Fourier transform that is key to calculating the location of salient areas, and proposed the phase spectrum of Fourier transform (PFT) model. In this paper, we present a quaternion representation of an image which is composed of intensity, color, and motion features. Based on the principle of PFT, a novel multiresolution spatiotemporal saliency detection model called phase spectrum of quaternion Fourier transform (PQFT) is proposed in this paper to calculate the spatiotemporal saliency map of an image by its quaternion representation. Distinct from other models, the added motion dimension allows the phase spectrum to represent spatiotemporal saliency in order to perform attention selection not only for images but also for videos. In addition, the PQFT model can compute the saliency map of an image under various resolutions from coarse to fine. Therefore, the hierarchical selectivity (HS) framework based on the PQFT model is introduced here to construct the tree structure representation of an image. With the help of HS, a model called multiresolution wavelet domain foveation (MWDF) is proposed in this paper to improve coding efficiency in image and video compression. Extensive tests of videos, natural images, and psychological patterns show that the proposed PQFT model is more effective in saliency detection and can predict eye fixations better than other state-of-the-art models in previous literature. Moreover, our model requires low computational cost and, therefore, can work in real time. Additional experiments on image and video compression show that the HS-MWDF model can achieve higher compression rate than the traditional model.

摘要

自然场景中的显著区域通常被认为是人眼会关注的区域,而找到这些区域是目标检测的关键步骤。在计算机视觉中,已经提出了许多模拟眼睛行为的模型,如 SaliencyToolBox (STB)、 Neuromorphic Vision Toolkit (NVT) 等,但它们需要较高的计算成本,并且计算有用的结果主要依赖于它们参数的选择。虽然已经提出了一些基于区域的方法来降低特征图的计算复杂度,但这些方法仍然无法实时工作。最近,提出了一种简单而快速的方法称为谱残差 (SR),它使用振幅谱的 SR 来计算图像的显著图。然而,在我们之前的工作中,我们指出,计算显著区域位置的关键是图像傅里叶变换的相位谱,而不是幅度谱,并提出了傅里叶变换的相位谱 (PFT) 模型。在本文中,我们提出了一种由强度、颜色和运动特征组成的图像四元数表示。基于 PFT 的原理,本文提出了一种新的多分辨率时空域显著检测模型,称为四元数傅里叶变换的相位谱 (PQFT),通过其四元数表示来计算图像的时空域显著图。与其他模型不同,添加的运动维度允许相位谱表示时空显著,以便不仅对图像而且对视频执行注意选择。此外,PQFT 模型可以从粗到细计算图像的显著图的各种分辨率。因此,本文引入了基于 PQFT 模型的分层选择性 (HS) 框架来构建图像的树状结构表示。借助 HS,本文提出了一种称为多分辨率小波域注视 (MWDF) 的模型,以提高图像和视频压缩的编码效率。对视频、自然图像和心理模式的广泛测试表明,所提出的 PQFT 模型在显著检测方面更有效,并且可以比以前文献中的其他最新模型更好地预测眼动。此外,我们的模型需要较低的计算成本,因此可以实时工作。在图像和视频压缩方面的额外实验表明,HS-MWDF 模型可以实现比传统模型更高的压缩率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验