Xu Zhe-Xin, Pang Jiayi, Anzai Akiyuki, DeAngelis Gregory C
Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY, USA.
Department of Neurobiology, Harvard Medical School, Boston, MA, USA.
bioRxiv. 2025 May 19:2024.10.29.620928. doi: 10.1101/2024.10.29.620928.
Vision is an active process. We move our eyes and head to acquire useful information and to track objects of interest. While these movements are essential for many behaviors, they greatly complicate the analysis of retinal image motion-the image motion of an object reflects both how that object moves in the world and how the eye moves relative to the scene. Our brain must account for the visual consequences of self-motion to accurately perceive the 3D layout and motion of objects in the scene. Traditionally, compensation for eye movements (e.g., smooth pursuit) has been modeled as a simple vector subtraction process. While these models are effective for pure eye rotations and 2D scenes, we show that they fail to apply to more natural viewing geometries involving combinations of eye rotation and translation. We develop theoretical predictions for how perception of object motion and depth should depend on the observer's inferred viewing geometry. Through psychophysical experiments, we demonstrate novel perceptual biases that manifest when different viewing geometries are simulated by optic flow, in the absence of physical eye movements. Remarkably, these biases occur automatically, without training or feedback, and are well predicted by our theoretical framework. A neural network model trained to perform the same tasks exhibits neural response patterns similar to those observed in macaque area MT, suggesting a possible neural basis for these adaptive computations. Our findings demonstrate that the visual system automatically infers viewing geometry from optic flow and flexibly attributes components of image motion to either self-motion or depth structure according to the inferred geometry. Our findings unify previously separate bodies of work by showing that the visual consequences of self-motion play a crucial role in computing object motion and depth, thus enabling the visual system to adaptively perceive a dynamic 3D environment.
视觉是一个主动的过程。我们移动眼睛和头部以获取有用信息并跟踪感兴趣的物体。虽然这些运动对许多行为至关重要,但它们极大地复杂化了视网膜图像运动的分析——物体的图像运动既反映了该物体在世界中的运动方式,也反映了眼睛相对于场景的运动方式。我们的大脑必须考虑自身运动的视觉后果,以便准确感知场景中物体的三维布局和运动。传统上,对眼球运动(例如,平稳跟踪)的补偿被建模为一个简单的向量减法过程。虽然这些模型对于纯眼球旋转和二维场景有效,但我们表明它们不适用于涉及眼球旋转和平移组合的更自然的观察几何结构。我们针对物体运动和深度的感知如何依赖于观察者推断的观察几何结构提出了理论预测。通过心理物理学实验,我们证明了在没有实际眼球运动的情况下,当通过光流模拟不同的观察几何结构时会出现新的感知偏差。值得注意的是,这些偏差会自动出现,无需训练或反馈,并且我们的理论框架能很好地预测它们。一个经过训练执行相同任务的神经网络模型表现出与猕猴MT区观察到的类似的神经反应模式,这表明了这些适应性计算可能的神经基础。我们的研究结果表明,视觉系统会根据光流自动推断观察几何结构,并根据推断出的几何结构灵活地将图像运动的组成部分归因于自身运动或深度结构。我们的研究结果通过表明自身运动的视觉后果在计算物体运动和深度中起着关键作用,从而统一了以前分开的工作,使视觉系统能够自适应地感知动态三维环境。