Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, United Kingdom.
PLoS Comput Biol. 2011 Aug;7(8):e1002142. doi: 10.1371/journal.pcbi.1002142. Epub 2011 Aug 18.
Stereo "3D" depth perception requires the visual system to extract binocular disparities between the two eyes' images. Several current models of this process, based on the known physiology of primary visual cortex (V1), do this by computing a piecewise-frontoparallel local cross-correlation between the left and right eye's images. The size of the "window" within which detectors examine the local cross-correlation corresponds to the receptive field size of V1 neurons. This basic model has successfully captured many aspects of human depth perception. In particular, it accounts for the low human stereoresolution for sinusoidal depth corrugations, suggesting that the limit on stereoresolution may be set in primary visual cortex. An important feature of the model, reflecting a key property of V1 neurons, is that the initial disparity encoding is performed by detectors tuned to locally uniform patches of disparity. Such detectors respond better to square-wave depth corrugations, since these are locally flat, than to sinusoidal corrugations which are slanted almost everywhere. Consequently, for any given window size, current models predict better performance for square-wave disparity corrugations than for sine-wave corrugations at high amplitudes. We have recently shown that this prediction is not borne out: humans perform no better with square-wave than with sine-wave corrugations, even at high amplitudes. The failure of this prediction raised the question of whether stereoresolution may actually be set at later stages of cortical processing, perhaps involving neurons tuned to disparity slant or curvature. Here we extend the local cross-correlation model to include existing physiological and psychophysical evidence indicating that larger disparities are detected by neurons with larger receptive fields (a size/disparity correlation). We show that this simple modification succeeds in reconciling the model with human results, confirming that stereoresolution for disparity gratings may indeed be limited by the size of receptive fields in primary visual cortex.
立体“3D”深度感知需要视觉系统从双眼图像中提取视差。基于初级视皮层(V1)的已知生理学,目前有几种此类过程的模型,通过计算左眼和右眼图像之间的分段前顶平行局部互相关来实现。探测器检查局部互相关的“窗口”的大小对应于 V1 神经元的感受野大小。这个基本模型成功地捕捉到了人类深度感知的许多方面。特别是,它解释了人类正弦深度波纹的低立体分辨率,这表明立体分辨率的限制可能在初级视皮层中设置。模型的一个重要特征,反映了 V1 神经元的一个关键特性,是初始视差编码是通过对局部均匀视差补丁进行调谐的探测器来完成的。这种探测器对正方形波深度波纹的响应更好,因为这些波纹是局部平坦的,而不是倾斜的正弦波纹。因此,对于任何给定的窗口大小,当前模型预测正方形波视差波纹的性能优于正弦波波纹,尤其是在高振幅下。我们最近表明,这一预测并不成立:即使在高振幅下,人类在正方形波和正弦波波纹之间的表现也没有更好。这一预测的失败引发了一个问题,即立体分辨率实际上是否可能在皮质处理的后期阶段设置,也许涉及到对视差倾斜或曲率进行调谐的神经元。在这里,我们将局部互相关模型扩展到包括现有的生理学和心理物理学证据,表明较大的视差由具有较大感受野的神经元检测到(大小/视差相关性)。我们表明,这种简单的修改成功地使模型与人类结果一致,证实了用于视差光栅的立体分辨率确实可能受到初级视皮层感受野大小的限制。