Department of Computer Science, University of Warwick Coventry, UK.
Department of Computer Science, University of Warwick Coventry, UK ; Oxford Centre for Computational Neuroscience Oxford, UK.
Front Comput Neurosci. 2014 Apr 1;8:37. doi: 10.3389/fncom.2014.00037. eCollection 2014.
When we see a human sitting down, standing up, or walking, we can recognize one of these poses independently of the individual, or we can recognize the individual person, independently of the pose. The same issues arise for deforming objects. For example, if we see a flag deformed by the wind, either blowing out or hanging languidly, we can usually recognize the flag, independently of its deformation; or we can recognize the deformation independently of the identity of the flag. We hypothesize that these types of recognition can be implemented by the primate visual system using temporo-spatial continuity as objects transform as a learning principle. In particular, we hypothesize that pose or deformation can be learned under conditions in which large numbers of different people are successively seen in the same pose, or objects in the same deformation. We also hypothesize that person-specific representations that are independent of pose, and object-specific representations that are independent of deformation and view, could be built, when individual people or objects are observed successively transforming from one pose or deformation and view to another. These hypotheses were tested in a simulation of the ventral visual system, VisNet, that uses temporal continuity, implemented in a synaptic learning rule with a short-term memory trace of previous neuronal activity, to learn invariant representations. It was found that depending on the statistics of the visual input, either pose-specific or deformation-specific representations could be built that were invariant with respect to individual and view; or that identity-specific representations could be built that were invariant with respect to pose or deformation and view. We propose that this is how pose-specific and pose-invariant, and deformation-specific and deformation-invariant, perceptual representations are built in the brain.
当我们看到一个人坐下、站起或行走时,我们可以独立于个体识别出这些姿势之一,或者我们可以独立于姿势识别出个体。对于变形物体,也会出现同样的问题。例如,如果我们看到一面旗帜被风吹变形,无论是向外吹还是懒洋洋地悬挂着,我们通常可以独立于其变形识别出旗帜;或者我们可以独立于旗帜的身份识别出变形。我们假设这些类型的识别可以由灵长类动物视觉系统使用时空连续性来实现,作为物体变形的学习原则。具体来说,我们假设在连续看到大量不同人处于相同姿势或同一物体处于同一变形的情况下,可以学习姿势或变形。我们还假设,当个体人或物体连续从一种姿势或变形和视角转换到另一种时,可以建立独立于姿势的特定于人的表示,以及独立于变形和视角的特定于物体的表示。这些假设在模拟腹侧视觉系统 VisNet 中进行了测试,该系统使用时间连续性,通过具有短期记忆痕迹的突触学习规则来实现,以学习不变表示。结果发现,根据视觉输入的统计数据,可以建立特定于姿势或变形的表示,这些表示对于个体和视角是不变的;或者可以建立特定于身份的表示,这些表示对于姿势或变形和视角是不变的。我们提出,这就是大脑中建立特定于姿势和不变的、特定于变形和不变的、感知表示的方式。