J Cogn Neurosci. 1992 Winter;4(1):35-57. doi: 10.1162/jocn.1992.4.1.35.
A new type of biologically inspired multilayered network is proposed to model the properties of the primate visual system with respect to invariant visual recognition (IVR). This model is based on 10 major neurobiological and psychological constraints. The first five constraints shape the architecture and properties of the network. 1. The network model has a Y-like double-branched multilayered architecture, with one input (the retina) and two parallel outputs, the "What" and the "Where," which model, respectively, the temporal pathway, specialized for "object" identification, and the parietal pathway specialized for "spatial" localization. 2. Four processing layers are sufficient to model the main functional steps of primate visual system that transform the retinal information into prototypes (object-centered reference frame) in the "What" branch and into an oculomotor command in the "Where" branch. 3. The distribution of receptive field sizes within and between the two functional pathways provides an appropriate tradeoff between discrimination and invariant recognition capabilities. 4. The two outputs are represented by a population coding: the ocular command is computed as a population vector in the "Where" branch and the prototypes are coded in a "semidistributed" way in the "What" branch. In the intermediate associative steps, processing units learn to associate prototypes (through feedback connections) to component features (through feedforward ones). 5. The basic processing units of the network do not model single cells but model the local neuronal circuits that combine different information flows organized in separate cortical layers. Such a biologically constrained model shows shift-invariant and size-invariant capabilities that resemble those of humans (psychological constraints): 6. During the Learning session, a set of patterns (26 capital letters and 2 geometric figures) are presented to the network: a single presentation of each pattern in one position (at the center) and with one size is sufficient to learn the corresponding prototypes (internal representations). These patterns are thus presented in widely varying new sizes and positions during the Recognition session: 7. The "What" branch of the network succeeds in immediate recognition for patterns presented in the central zone of the retina with the learned size. 8. The recognition by the "What" branch is resistant to changes in size within a limited range of variation related to the distribution of receptive field (RF) sizes in the successive processing steps of this pathway. 9. Even when ocular movements are not allowed, the recognition capabilities of the "What" branch are unaffected by changing positions around the learned one. This significant shift-invariance of the "What" branch is also related to the distribution of RF sizes. 10. When varying both sizes and locations, the "What" and the "Where" branches cooperate for recognition: the location coding in the "Where" branch can command, under the control of the "What" branch, an ocular movement efficient to reset peripheral patterns toward the central zone of the retina until successful recognition. This model results in predictions about anatomical connections and physiological interactions between temporal and parietal cortices.
提出了一种新型的受生物启发的多层网络,用于对灵长类视觉系统的不变视觉识别(IVR)特性进行建模。该模型基于 10 个主要的神经生物学和心理学约束条件。前五个约束条件塑造了网络的架构和特性。1. 网络模型具有 Y 形的双层多层架构,有一个输入(视网膜)和两个并行输出,“What”和“Where”,分别用于模拟专门用于“对象”识别的时间通路和专门用于“空间”定位的顶叶通路。2. 四个处理层足以模拟灵长类视觉系统的主要功能步骤,将视网膜信息转换为“ What”分支中的原型(以对象为中心的参考系)和“ Where”分支中的眼球运动命令。3. 两个功能通路内和通路之间的感受野大小分布提供了辨别力和不变识别能力之间的适当折衷。4. 两个输出由群体编码表示:眼球运动命令在“ Where”分支中计算为群体向量,原型在“ What”分支中以“半分布式”方式编码。在中间联想步骤中,处理单元通过反馈连接学习将原型(通过反馈连接)与组件特征(通过前馈连接)相关联。5. 网络的基本处理单元不模拟单个细胞,而是模拟将组织在不同皮层层中的不同信息流组合在一起的局部神经元回路。这样一种受生物约束的模型显示出类似于人类的移位不变性和大小不变性能力(心理约束):6. 在学习过程中,将一组模式(26 个大写字母和 2 个几何图形)呈现给网络:每个模式在一个位置(中心)和一个大小上的单一呈现足以学习相应的原型(内部表示)。因此,在识别过程中,这些模式以广泛变化的新大小和位置呈现:7. 网络的“ What”分支能够成功识别在学习大小的视网膜中央区域呈现的模式。8. 该“ What”分支的识别不受限于与该途径的连续处理步骤中的感受野(RF)大小分布相关的变化范围内的大小变化的影响。9. 即使不允许眼球运动,“ What”分支的识别能力也不受围绕学习位置的变化的影响。这种“ What”分支的显著移位不变性也与 RF 大小的分布有关。10. 当大小和位置同时变化时,“ What”和“ Where”分支协同进行识别:“ Where”分支中的位置编码可以在“ What”分支的控制下,命令眼球运动有效地将外围模式重置到视网膜的中央区域,直到成功识别。该模型对颞叶和顶叶皮质之间的解剖连接和生理相互作用产生了预测。