Mengers Vito, Roth Nicolas, Brock Oliver, Obermayer Klaus, Rolfs Martin
Technische Universität Berlin, Berlin, Germany.
Science of Intelligence, Research Cluster of Excellence, Berlin, Germany.
J Vis. 2025 Feb 3;25(2):6. doi: 10.1167/jov.25.2.6.
The objects we perceive guide our eye movements when observing real-world dynamic scenes. Yet, gaze shifts and selective attention are critical for perceiving details and refining object boundaries. Object segmentation and gaze behavior are, however, typically treated as two independent processes. Here, we present a computational model that simulates these processes in an interconnected manner and allows for hypothesis-driven investigations of distinct attentional mechanisms. Drawing on an information processing pattern from robotics, we use a Bayesian filter to recursively segment the scene, which also provides an uncertainty estimate for the object boundaries that we use to guide active scene exploration. We demonstrate that this model closely resembles observers' free viewing behavior on a dataset of dynamic real-world scenes, measured by scanpath statistics, including foveation duration and saccade amplitude distributions used for parameter fitting and higher-level statistics not used for fitting. These include how object detections, inspections, and returns are balanced and a delay of returning saccades without an explicit implementation of such temporal inhibition of return. Extensive simulations and ablation studies show that uncertainty promotes balanced exploration and that semantic object cues are crucial to forming the perceptual units used in object-based attention. Moreover, we show how our model's modular design allows for extensions, such as incorporating saccadic momentum or presaccadic attention, to further align its output with human scanpaths.
我们所感知的物体在观察现实世界动态场景时会引导我们的眼球运动。然而,注视转移和选择性注意对于感知细节和细化物体边界至关重要。然而,物体分割和注视行为通常被视为两个独立的过程。在此,我们提出一种计算模型,该模型以相互关联的方式模拟这些过程,并允许对不同的注意机制进行假设驱动的研究。借鉴机器人学中的信息处理模式,我们使用贝叶斯滤波器对场景进行递归分割,这也为我们用于指导主动场景探索的物体边界提供了不确定性估计。我们证明,该模型与在动态现实世界场景数据集上观察者的自由观看行为非常相似,通过扫描路径统计进行测量,包括用于参数拟合的注视持续时间和扫视幅度分布以及未用于拟合的更高层次统计。这些包括物体检测、检查和返回如何平衡,以及返回扫视的延迟,而无需明确实施这种返回的时间抑制。广泛的模拟和消融研究表明,不确定性促进了平衡探索,并且语义物体线索对于形成基于物体的注意中使用的感知单元至关重要。此外,我们展示了我们模型的模块化设计如何允许扩展,例如纳入扫视动量或扫视前注意,以进一步使其输出与人类扫描路径对齐。