Center for Brains, Minds and Machines, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, United States of America.
Computer Science Department, Goethe University Frankfurt, Frankfurt am Main, Germany.
Sci Rep. 2020 Jan 29;10(1):1411. doi: 10.1038/s41598-019-57261-6.
Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. To understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons' receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.
尽管对新物体的识别不变性范围是人类视觉的一个基本方面,但对其特征的描述仍然出人意料地难以捉摸。在这里,我们通过测量非韩国人在一次闪现中对韩国字母的识别准确率来报告对尺度和位置变化的容忍度,这些非韩国人以前没有接触过韩国字母。我们发现,人类在仅接触一次新物体后就具有显著的尺度不变性。平移不变性的范围是有限的,取决于呈现物体的大小和位置。为了理解与不变性特征相关的大脑计算基础,我们将实验数据与计算建模结果进行了比较。我们的结果表明,为了解释人类对物体的不变识别,神经网络模型应该通过编码不同的尺度通道以及由神经元感受野大小和采样密度捕获的与离轴有关的表示来明确纳入内置的尺度不变性,这些尺度通道和表示会随着离轴而变化。我们的心理物理学实验和相关模拟强烈表明,人类视觉系统使用的计算策略在某些关键方面与当前的深度学习架构不同,它更注重数据效率,更依赖于眼球运动。