IEEE Trans Pattern Anal Mach Intell. 2015 Dec;37(12):2428-40. doi: 10.1109/TPAMI.2015.2424870.
There are two sides to every story of visual saliency modeling in the frequency domain. On the one hand, image saliency can be effectively estimated by applying simple operations to the frequency spectrum. On the other hand, it is still unclear which part of the frequency spectrum contributes the most to popping-out targets and suppressing distractors. Toward this end, this paper tentatively explores the secret of image saliency in the frequency domain. From the results obtained in several qualitative and quantitative experiments, we find that the secret of visual saliency may mainly hide in the phases of intermediate frequencies. To explain this finding, we reinterpret the concept of discrete Fourier transform from the perspective of template-based contrast computation and thus develop several principles for designing the saliency detector in the frequency domain. Following these principles, we propose a novel approach to design the saliency detector under the assistance of prior knowledge obtained through both unsupervised and supervised learning processes. Experimental results on a public image benchmark show that the learned saliency detector outperforms 18 state-of-the-art approaches in predicting human fixations.
视觉显著度模型在频域中有两面性。一方面,可以通过对频谱进行简单的运算来有效地估计图像显著度。另一方面,目前还不清楚频谱的哪一部分对突出目标和抑制干扰物贡献最大。为此,本文初步探索了频域中图像显著度的秘密。通过对几个定性和定量实验的结果进行分析,我们发现视觉显著度的秘密可能主要隐藏在中频的相位中。为了解释这一发现,我们从基于模板的对比度计算的角度重新解释了离散傅里叶变换的概念,从而提出了一些设计频域显著度检测器的原则。根据这些原则,我们提出了一种在无监督和监督学习过程中获得的先验知识的辅助下设计显著度检测器的新方法。在一个公共图像基准上的实验结果表明,所学习的显著度检测器在预测人类注视点方面优于 18 种最先进的方法。