一种嵌入式显著性图估计方案：应用于视频编码。

An embedded saliency map estimator scheme: application to video encoding.

作者信息

Tsapatsoulis Nicolas, Rapantzikos Konstantinos, Pattichis Constantinos

机构信息

Department of Computer Science, University of Cyprus, CY 1678, Cyprus.

出版信息

Int J Neural Syst. 2007 Aug;17(4):289-304. doi: 10.1142/S0129065707001147.

DOI:10.1142/S0129065707001147

PMID:17696293

Abstract

In this paper we propose a novel saliency-based computational model for visual attention. This model processes both top-down (goal directed) and bottom-up information. Processing in the top-down channel creates the so called skin conspicuity map and emulates the visual search for human faces performed by humans. This is clearly a goal directed task but is generic enough to be context independent. Processing in the bottom-up information channel follows the principles set by Itti et al. but it deviates from them by computing the orientation, intensity and color conspicuity maps within a unified multi-resolution framework based on wavelet subband analysis. In particular, we apply a wavelet based approach for efficient computation of the topographic feature maps. Given that wavelets and multiresolution theory are naturally connected the usage of wavelet decomposition for mimicking the center surround process in humans is an obvious choice. However, our implementation goes further. We utilize the wavelet decomposition for inline computation of the features (such as orientation angles) that are used to create the topographic feature maps. The bottom-up topographic feature maps and the top-down skin conspicuity map are then combined through a sigmoid function to produce the final saliency map. A prototype of the proposed model was realized through the TMDSDMK642-0E DSP platform as an embedded system allowing real-time operation. For evaluation purposes, in terms of perceived visual quality and video compression improvement, a ROI-based video compression setup was followed. Extended experiments concerning both MPEG-1 as well as low bit-rate MPEG-4 video encoding were conducted showing significant improvement in video compression efficiency without perceived deterioration in visual quality.

摘要

在本文中，我们提出了一种新颖的基于显著性的视觉注意力计算模型。该模型同时处理自上而下（目标导向）和自下而上的信息。自上而下通道的处理创建了所谓的皮肤显著图，并模拟了人类执行的对人脸的视觉搜索。这显然是一个目标导向任务，但具有足够的通用性，与上下文无关。自下而上信息通道的处理遵循Itti等人设定的原则，但通过在基于小波子带分析的统一多分辨率框架内计算方向、强度和颜色显著图，与这些原则有所不同。特别是，我们应用基于小波的方法来高效计算地形特征图。鉴于小波和多分辨率理论自然相关，使用小波分解来模仿人类的中心环绕过程是一个明显的选择。然而，我们的实现更进一步。我们利用小波分解对用于创建地形特征图的特征（如方向角）进行在线计算。然后，通过一个Sigmoid函数将自下而上的地形特征图和自上而下的皮肤显著图进行组合，以生成最终的显著图。所提出模型的一个原型通过TMDSDMK642 - 0E DSP平台作为嵌入式系统实现，允许实时操作。为了进行评估，在感知视觉质量和视频压缩改进方面，遵循了基于感兴趣区域（ROI）的视频压缩设置。针对MPEG - 1以及低比特率MPEG - 4视频编码进行了扩展实验，结果表明视频压缩效率有显著提高，且视觉质量没有明显下降。