Parks Daniel, Borji Ali, Itti Laurent
Neuroscience Graduate Program, University of Southern California, 3641 Watt Way, Los Angeles, CA 90089, USA.
Department of Computer Science, University of Wisconsin - Milwaukee, PO Box 784, Milwaukee, WI 53211, USA.
Vision Res. 2015 Nov;116(Pt B):113-26. doi: 10.1016/j.visres.2014.10.027. Epub 2014 Nov 13.
Previous studies have shown that gaze direction of actors in a scene influences eye movements of passive observers during free-viewing (Castelhano, Wieth, & Henderson, 2007; Borji, Parks, & Itti, 2014). However, no computational model has been proposed to combine bottom-up saliency with actor's head pose and gaze direction for predicting where observers look. Here, we first learn probability maps that predict fixations leaving head regions (gaze following fixations), as well as fixations on head regions (head fixations), both dependent on the actor's head size and pose angle. We then learn a combination of gaze following, head region, and bottom-up saliency maps with a Markov chain composed of head region and non-head region states. This simple structure allows us to inspect the model and make comments about the nature of eye movements originating from heads as opposed to other regions. Here, we assume perfect knowledge of actor head pose direction (from an oracle). The combined model, which we call the Dynamic Weighting of Cues model (DWOC), explains observers' fixations significantly better than each of the constituent components. Finally, in a fully automatic combined model, we replace the oracle head pose direction data with detections from a computer vision model of head pose. Using these (imperfect) automated detections, we again find that the combined model significantly outperforms its individual components. Our work extends the engineering and scientific applications of saliency models and helps better understand mechanisms of visual attention.
先前的研究表明,场景中演员的注视方向会影响被动观察者在自由观看时的眼动(Castelhano、Wieth和Henderson,2007年;Borji、Parks和Itti,2014年)。然而,尚未有人提出计算模型,将自下而上的显著性与演员的头部姿势和注视方向相结合,以预测观察者的注视位置。在此,我们首先学习概率图,该概率图可预测离开头部区域的注视(注视跟随注视)以及头部区域上的注视(头部注视),这两者均取决于演员的头部大小和姿势角度。然后,我们通过由头部区域和非头部区域状态组成的马尔可夫链,学习注视跟随、头部区域和自下而上的显著性图的组合。这种简单的结构使我们能够检查模型,并对源自头部而非其他区域的眼动性质进行评论。在此,我们假设对演员头部姿势方向有完美的了解(来自神谕)。我们将这个组合模型称为线索动态加权模型(DWOC),它对观察者注视的解释明显优于每个组成部分。最后,在一个全自动的组合模型中,我们用头部姿势的计算机视觉模型的检测结果替换神谕头部姿势方向数据。使用这些(不完美的)自动检测结果,我们再次发现组合模型明显优于其各个组成部分。我们的工作扩展了显著性模型的工程和科学应用,并有助于更好地理解视觉注意机制。