Laboratory of Neural Systems, The Rockefeller University, New York, United States.
School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Islamic Republic of Iran.
Elife. 2024 Apr 25;13:e90256. doi: 10.7554/eLife.90256.
Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g. left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.
灵长类动物可以识别物体,尽管存在 3D 几何变形,例如深度旋转。产生这种不变性的计算机制尚未完全理解。猕猴脸 patch AL 和深度卷积网络的完全连接层中出现了一种有趣的部分不变性情况,其中神经元对镜像对称视图(例如左右轮廓)的反应相似。这种调谐为什么会发展?在这里,我们提出了一种简单的学习驱动解释来解释镜像对称观点调谐。我们表明,即使在训练数据集中不包括人脸的情况下,在基于对象识别任务训练的卷积深度神经网络的完全连接层中也会出现针对人脸的镜像对称观点调谐。首先,我们使用从多个视角渲染的 3D 对象作为测试刺激,证明了卷积神经网络模型中的镜像对称观点调谐不仅仅适用于人脸:它适用于具有双边对称性的多个对象类别。其次,我们展示了为什么这种不变性会在模型中出现。学习区分双边对称的对象类别会诱导反射等变的中间表示。当具有足够大感受野的下游单元对这种等变响应进行空间池化时,就会实现类似于 AL 的镜像对称调谐。这些结果解释了镜像对称观点调谐如何在神经网络中出现,为它们如何在灵长类动物大脑中出现提供了一种理论。我们的理论预测,镜像对称观点调谐可以作为暴露于双边对称物体的结果而出现,并且可以超越以前经验的对象类别进行推广。