Kanan Christopher, Tong Mathew H, Zhang Lingyun, Cottrell Garrison W
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
Vis cogn. 2009 Aug 1;17(6-7):979-1003. doi: 10.1080/13506280902771138.
When people try to find particular objects in natural scenes they make extensive use of knowledge about how and where objects tend to appear in a scene. Although many forms of such "top-down" knowledge have been incorporated into saliency map models of visual search, surprisingly, the role of object appearance has been infrequently investigated. Here we present an appearance-based saliency model derived in a Bayesian framework. We compare our approach with both bottom-up saliency algorithms as well as the state-of-the-art Contextual Guidance model of Torralba et al. (2006) at predicting human fixations. Although both top-down approaches use very different types of information, they achieve similar performance; each substantially better than the purely bottom-up models. Our experiments reveal that a simple model of object appearance can predict human fixations quite well, even making the same mistakes as people.
当人们试图在自然场景中寻找特定物体时,他们会广泛利用有关物体在场景中出现的方式和位置的知识。尽管许多形式的此类“自上而下”知识已被纳入视觉搜索的显著性地图模型中,但令人惊讶的是,物体外观的作用却很少被研究。在此,我们展示了一种在贝叶斯框架下推导出来的基于外观的显著性模型。我们将我们的方法与自下而上的显著性算法以及Torralba等人(2006年)的最新情境引导模型在预测人类注视点方面进行了比较。尽管这两种自上而下的方法使用非常不同类型的信息,但它们取得了相似的性能;每种方法都比纯粹的自下而上模型要好得多。我们的实验表明,一个简单的物体外观模型可以很好地预测人类注视点,甚至会犯与人类相同的错误。