IEEE Trans Neural Netw Learn Syst. 2016 Jun;27(6):1214-26. doi: 10.1109/TNNLS.2015.2480683. Epub 2015 Oct 7.
Predicting where people look in natural scenes has attracted a lot of interest in computer vision and computational neuroscience over the past two decades. Two seemingly contrasting categories of cues have been proposed to influence where people look: 1) low-level image saliency and 2) high-level semantic information. Our first contribution is to take a detailed look at these cues to confirm the hypothesis proposed by Henderson and Nuthmann and Henderson that observers tend to look at the center of objects. We analyzed fixation data for scene free-viewing over 17 observers on 60 object-annotated images with various types of objects. Images contained different types of scenes, such as natural scenes, line drawings, and 3-D rendered scenes. Our second contribution is to propose a simple combined model of low-level saliency and object center bias that outperforms each individual component significantly over our data, as well as on the Object and Semantic Images and Eye-tracking data set by Xu et al. The results reconcile saliency with object center-bias hypotheses and highlight that both types of cues are important in guiding fixations. Our work opens new directions to understand strategies that humans use in observing scenes and objects, and demonstrates the construction of combined models of low-level saliency and high-level object-based information.
在过去的二十年中,预测人们在自然场景中看哪里引起了计算机视觉和计算神经科学的极大兴趣。已经提出了两种看似相互矛盾的线索类别来影响人们的目光:1)低水平的图像显著性和 2)高水平的语义信息。我们的第一个贡献是详细研究这些线索,以确认 Henderson 和 Nuthmann 以及 Henderson 提出的假设,即观察者倾向于注视物体的中心。我们分析了 17 位观察者在 60 张带有各种类型物体的对象注释图像上的自由观看场景的注视数据。图像包含不同类型的场景,例如自然场景、线条图和 3D 渲染场景。我们的第二个贡献是提出了一种简单的低水平显著性和对象中心偏差的组合模型,该模型在我们的数据以及 Xu 等人的对象和语义图像和眼动数据集上的表现明显优于每个单独的组件。结果调和了显著性和对象中心偏差假说,并强调了这两种类型的线索在引导注视方面都很重要。我们的工作为理解人类观察场景和对象的策略开辟了新的方向,并展示了低水平显著性和高水平基于对象的信息的组合模型的构建。