Dorr Michael, Vig Eleonora, Barth Erhardt
Institute for Neuro- and Bioinformatics, University of Lübeck, Ratzeburger Allee 160, D-23538 Lübeck, Germany,
Vis cogn. 2012 Jan 1;20(4-5):495-514. doi: 10.1080/13506285.2012.667456. Epub 2012 Mar 26.
We here study the predictability of eye movements when viewing high-resolution natural videos. We use three recently published gaze data sets that contain a wide range of footage, from scenes of almost still-life character to professionally made, fast-paced advertisements and movie trailers. Inter-subject gaze variability differs significantly between data sets, with variability being lowest for the professional movies. We then evaluate three state-of-the-art saliency models on these data sets. A model that is based on the invariants of the structure tensor and that combines very generic, sparse video representations with machine learning techniques outperforms the two reference models; performance is further improved for two data sets when the model is extended to a perceptually inspired colour space. Finally, a combined analysis of gaze variability and predictability shows that eye movements on the professionally made movies are the most coherent (due to implicit gaze-guidance strategies of the movie directors), yet the least predictable (presumably due to the frequent cuts). Our results highlight the need for standardized benchmarks to comparatively evaluate eye movement prediction algorithms.
我们在此研究观看高分辨率自然视频时眼动的可预测性。我们使用了三个最近发布的注视数据集,这些数据集包含了广泛的镜头内容,从几乎是静态人物的场景到专业制作的快节奏广告和电影预告片。不同数据集之间的个体间注视变异性存在显著差异,其中专业电影的变异性最低。然后,我们在这些数据集上评估了三种最先进的显著性模型。一种基于结构张量不变量且将非常通用、稀疏的视频表示与机器学习技术相结合的模型优于另外两种参考模型;当该模型扩展到受感知启发的颜色空间时,两个数据集的性能进一步提高。最后,对注视变异性和可预测性的综合分析表明,专业制作电影中的眼动最为连贯(这是由于电影导演的隐式注视引导策略),但可预测性最低(可能是由于频繁的镜头切换)。我们的结果凸显了需要标准化基准来比较评估眼动预测算法。