Center for Adaptive Systems, Department of Cognitive and Neural Systems, Center of Excellence for Learning in Education, Science, and Technology, Boston University, 677 Beacon Street, Boston, MA 02215, USA
Neural Netw. 2011 Dec;24(10):1050-61. doi: 10.1016/j.neunet.2011.04.004. Epub 2011 Apr 22.
All primates depend for their survival on being able to rapidly learn about and recognize objects. Objects may be visually detected at multiple positions, sizes, and viewpoints. How does the brain rapidly learn and recognize objects while scanning a scene with eye movements, without causing a combinatorial explosion in the number of cells that are needed? How does the brain avoid the problem of erroneously classifying parts of different objects together at the same or different positions in a visual scene? In monkeys and humans, a key area for such invariant object category learning and recognition is the inferotemporal cortex (IT). A neural model is proposed to explain how spatial and object attention coordinate the ability of IT to learn invariant category representations of objects that are seen at multiple positions, sizes, and viewpoints. The model clarifies how interactions within a hierarchy of processing stages in the visual brain accomplish this. These stages include the retina, lateral geniculate nucleus, and cortical areas V1, V2, V4, and IT in the brain's What cortical stream, as they interact with spatial attention processes within the parietal cortex of the Where cortical stream. The model builds upon the ARTSCAN model, which proposed how view-invariant object representations are generated. The positional ARTSCAN (pARTSCAN) model proposes how the following additional processes in the What cortical processing stream also enable position-invariant object representations to be learned: IT cells with persistent activity, and a combination of normalizing object category competition and a view-to-object learning law which together ensure that unambiguous views have a larger effect on object recognition than ambiguous views. The model explains how such invariant learning can be fooled when monkeys, or other primates, are presented with an object that is swapped with another object during eye movements to foveate the original object. The swapping procedure is predicted to prevent the reset of spatial attention, which would otherwise keep the representations of multiple objects from being combined by learning. Li and DiCarlo (2008) have presented neurophysiological data from monkeys showing how unsupervised natural experience in a target swapping experiment can rapidly alter object representations in IT. The model quantitatively simulates the swapping data by showing how the swapping procedure fools the spatial attention mechanism. More generally, the model provides a unifying framework, and testable predictions in both monkeys and humans, for understanding object learning data using neurophysiological methods in monkeys, and spatial attention, episodic learning, and memory retrieval data using functional imaging methods in humans.
所有灵长类动物的生存都依赖于它们能够快速学习和识别物体。物体可以在多个位置、大小和视角被视觉检测到。当大脑通过眼球运动扫描场景时,它如何在不导致所需细胞数量组合爆炸的情况下快速学习和识别物体?大脑如何避免在视觉场景中相同或不同位置将不同物体的部分错误地分类在一起的问题?在猴子和人类中,用于这种不变的物体类别学习和识别的关键区域是下颞叶皮层(IT)。提出了一个神经模型来解释空间和物体注意力如何协调 IT 学习在多个位置、大小和视角下看到的物体的不变类别表示的能力。该模型阐明了视觉大脑中处理阶段的层次结构内的相互作用如何实现这一点。这些阶段包括视网膜、外侧膝状体核以及大脑中的 V1、V2、V4 和 IT 等皮层区域的 What 皮层流,以及它们与位于 Where 皮层流中的顶叶皮层内的空间注意过程的相互作用。该模型建立在 ARTSCAN 模型的基础上,该模型提出了如何生成视图不变的物体表示。位置 ARTSCAN(pARTSCAN)模型提出了以下额外的过程在 What 皮层处理流中也能够学习位置不变的物体表示:具有持久活动的 IT 细胞,以及归一化物体类别竞争和视图到物体学习规律的组合,它们共同确保清晰的视图对物体识别的影响大于模糊的视图。该模型解释了当猴子或其他灵长类动物在眼球运动期间被呈现一个与另一个物体交换的物体时,这种不变的学习如何被愚弄。交换过程被预测会阻止空间注意力重置,否则这会导致多个物体的表示通过学习结合在一起。Li 和 DiCarlo(2008)提供了猴子的神经生理学数据,表明在目标交换实验中,无监督的自然经验如何迅速改变 IT 中的物体表示。该模型通过显示交换过程如何愚弄空间注意力机制,定量模拟了交换数据。更一般地,该模型为理解猴子的神经生理学方法中的物体学习数据以及人类的空间注意力、情景学习和记忆检索数据提供了一个统一的框架和可测试的预测。