Le Meur Olivier, Le Callet Patrick, Barba Dominique, Thoreau Dominique
Video Compression Laboratory, Thomson, Cesson-Sévigné, France.
IEEE Trans Pattern Anal Mach Intell. 2006 May;28(5):802-17. doi: 10.1109/TPAMI.2006.86.
Visual attention is a mechanism which filters out redundant visual information and detects the most relevant parts of our visual field. Automatic determination of the most visually relevant areas would be useful in many applications such as image and video coding, watermarking, video browsing, and quality assessment. Many research groups are currently investigating computational modeling of the visual attention system. The first published computational models have been based on some basic and well-understood Human Visual System (HVS) properties. These models feature a single perceptual layer that simulates only one aspect of the visual system. More recent models integrate complex features of the HVS and simulate hierarchical perceptual representation of the visual input. The bottom-up mechanism is the most occurring feature found in modern models. This mechanism refers to involuntary attention (i.e., salient spatial visual features that effortlessly or involuntary attract our attention). This paper presents a coherent computational approach to the modeling of the bottom-up visual attention. This model is mainly based on the current understanding of the HVS behavior. Contrast sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions are some of the features implemented in this model. The performances of this algorithm are assessed by using natural images and experimental measurements from an eye-tracking system. Two adequate well-known metrics (correlation coefficient and Kullbacl-Leibler divergence) are used to validate this model. A further metric is also defined. The results from this model are finally compared to those from a reference bottom-up model.
视觉注意力是一种机制,它能过滤掉冗余的视觉信息,并检测我们视野中最相关的部分。自动确定视觉上最相关的区域在许多应用中都很有用,如图像和视频编码、水印、视频浏览以及质量评估。目前许多研究小组正在研究视觉注意力系统的计算建模。最早发表的计算模型是基于一些基本且已被充分理解的人类视觉系统(HVS)特性。这些模型具有一个单一的感知层,仅模拟视觉系统的一个方面。最近的模型整合了HVS的复杂特征,并模拟视觉输入的分层感知表示。自下而上的机制是现代模型中最常见的特征。这种机制指的是不随意注意(即,显著的空间视觉特征会毫不费力地或不自主地吸引我们的注意力)。本文提出了一种连贯的计算方法来对自下而上的视觉注意力进行建模。该模型主要基于对HVS行为的当前理解。对比度敏感函数、感知分解、视觉掩蔽和中心 - 周边相互作用是该模型中实现的一些特征。通过使用自然图像和来自眼动追踪系统的实验测量来评估该算法的性能。使用两个合适的知名指标(相关系数和库尔贝克 - 莱布勒散度)来验证该模型。还定义了另一个指标。最后将该模型的结果与参考自下而上模型的结果进行比较。