Farzmahdi Amirhossein, Zarco Wilbert, Freiwald Winrich, Kriegeskorte Nikolaus, Golan Tal
bioRxiv. 2023 Jul 6:2023.01.05.522909. doi: 10.1101/2023.01.05.522909.
Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g., left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.
灵长类动物能够识别物体,即便存在诸如深度旋转等三维几何变化。然而,产生这种不变性的计算机制尚未完全被理解。在猕猴的面部斑块区域AL以及深度卷积网络的全连接层中出现了一种奇特的部分不变性情况,其中神经元对镜像对称视图(例如左右侧脸)的反应相似。这种调谐是如何形成的呢?在此,我们针对镜像对称视角调谐提出了一种基于学习驱动的简单解释。我们表明,即使训练数据集不包含面部,在基于物体识别任务训练的卷积深度神经网络的全连接层中也会出现针对面部的镜像对称视角调谐。首先,使用从多个视图渲染的三维物体作为测试刺激,我们证明卷积神经网络模型中的镜像对称视角调谐并非面部所特有:它在具有双侧对称性的多个物体类别中都会出现。其次,我们展示了这种不变性在模型中出现的原因。学习区分双侧对称的物体类别会诱导出反射等变的中间表示。当具有足够大感受野的下游单元对这种等变反应进行空间池化时,就实现了类似AL的镜像对称调谐。这些结果解释了镜像对称视角调谐如何能在神经网络中出现,为其在灵长类大脑中可能的出现方式提供了一种理论。我们的理论预测,镜像对称视角调谐可能是由于接触了除面部类别之外的双侧对称物体而出现的,并且它可以推广到先前未经历过的物体类别。