Zanca Dario, Melacci Stefano, Gori Marco
IEEE Trans Pattern Anal Mach Intell. 2020 Dec;42(12):2983-2995. doi: 10.1109/TPAMI.2019.2920636. Epub 2020 Nov 3.
The understanding of the mechanisms behind focus of attention in a visual scene is a problem of great interest in visual perception and computer vision. In this paper, we describe a model of scanpath as a dynamic process which can be interpreted as a variational law somehow related to mechanics, where the focus of attention is subject to a gravitational field. The distributed virtual mass that drives eye movements is associated with the presence of details and motion in the video. Unlike most current models, the proposed approach does not estimate directly the saliency map, but the prediction of eye movements allows us to integrate over time the positions of interest. The process of inhibition-of-return is also supported in the same dynamic model with the purpose of simulating fixations and saccades. The differential equations of motion of the proposed model are numerically integrated to simulate scanpaths on both images and videos. Experimental results for the tasks of saliency and scanpath prediction on a wide collection of datasets are presented to support the theory. Top level performances are achieved especially in the prediction of scanpaths, which is the primary purpose of the proposed model.
理解视觉场景中注意力焦点背后的机制是视觉感知和计算机视觉领域一个极具研究兴趣的问题。在本文中,我们将扫视路径描述为一个动态过程的模型,该过程可以被解释为某种与力学相关的变分定律,其中注意力焦点受到引力场的作用。驱动眼球运动的分布式虚拟质量与视频中细节和运动的存在相关联。与大多数当前模型不同,所提出的方法并不直接估计显著图,但对眼球运动的预测使我们能够随着时间的推移整合感兴趣的位置。在同一个动态模型中也支持返回抑制过程,以模拟注视和扫视。对所提出模型的运动微分方程进行数值积分,以模拟图像和视频上的扫视路径。给出了在大量数据集上进行显著性和扫视路径预测任务的实验结果,以支持该理论。特别是在扫视路径预测方面取得了顶级性能,这是所提出模型的主要目的。