School of Psychological Science, University of Bristol, Bristol, UK.
Department of Cognitive Science and Psychology, Sofia, New Bulgarian University, Bulgaria.
J Vis. 2021 Feb 3;21(2):9. doi: 10.1167/jov.21.2.9.
Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs).
视觉平移容忍度指的是我们在大范围不同视网膜位置识别物体的能力。尽管平移可能是视觉系统需要应对的最简单的空间变换,但人类视觉系统在以前未见过的位置识别物体的程度尚不清楚,一些研究报告称在 10 度以上几乎完全不变,而其他研究则报告在 4 度视角下完全不变。同样,在视觉计算模型中,平移容忍度的程度以及人类和模型性能之间的匹配程度也存在混淆。在这里,我们报告了一系列眼动追踪研究(总 N = 70),这些研究表明,在一个视网膜位置上训练的新物体可以在高达 18 度的平移后以高准确率识别。我们还表明,当标准的深度卷积神经网络(DCNN)经过预训练以在一系列位置上对另一组刺激进行分类时,或者当添加全局平均池化(GAP)层以产生更大的感受野时,该网络支持我们的发现。我们的研究结果为人类视觉理论提供了强有力的约束,并有助于解释以前使用卷积神经网络(CNN)报告的不一致发现。