Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA.
IEEE Trans Pattern Anal Mach Intell. 2013 Feb;35(2):314-28. doi: 10.1109/TPAMI.2012.119.
This paper introduces a new computational visual-attention model for static and dynamic saliency maps. First, we use the Earth Mover's Distance (EMD) to measure the center-surround difference in the receptive field, instead of using the Difference-of-Gaussian filter that is widely used in many previous visual-attention models. Second, we propose to take two steps of biologically inspired nonlinear operations for combining different features: combining subsets of basic features into a set of super features using the Lm-norm and then combining the super features using the Winner-Take-All mechanism. Third, we extend the proposed model to construct dynamic saliency maps from videos by using EMD for computing the center-surround difference in the spatiotemporal receptive field. We evaluate the performance of the proposed model on both static image data and video data. Comparison results show that the proposed model outperforms several existing models under a unified evaluation setting.
本文提出了一种新的用于静态和动态显著图的计算视觉注意模型。首先,我们使用了基于“Earth Mover's Distance (EMD)”的方法来测量感受野中的中心-周围差异,而不是使用许多先前的视觉注意模型中广泛使用的“Difference-of-Gaussian filter”。其次,我们提出了分两步进行基于生物启发的非线性操作来组合不同的特征:使用 Lm-范数将基本特征的子集组合成一组超特征,然后使用“Winner-Take-All”机制组合超特征。第三,我们通过使用 EMD 来计算时空感受野中的中心-周围差异,将所提出的模型扩展到从视频中构建动态显著图。我们在静态图像数据和视频数据上评估了所提出模型的性能。比较结果表明,在所提出的统一评估设置下,该模型优于几种现有的模型。