Cooper E E, Biederman I, Hummel J E
University of Minnesota.
Can J Psychol. 1992 Jun;46(2):191-214. doi: 10.1037/h0084317.
Phenomenologically, human shape recognition appears to be invariant with changes of orientation in depth (up to parts occlusion), position in the visual field, and size. Recent versions of template theories (e.g., Ullman, 1989; Lowe, 1987) assume that these invariances are achieved through the application of transformations such as rotation, translation, and scaling of the image so that it can be matched metrically to a stored template. Presumably, such transformations would require time for their execution. We describe recent priming experiments in which the effects of a prior brief presentation of an image on its subsequent recognition are assessed. The results of these experiments indicate that the invariance is complete: The magnitude of visual priming (as distinct from name or basic level concept priming) is not affected by a change in position, size, orientation in depth, or the particular lines and vertices present in the image, as long as representations of the same components can be activated. An implemented seven layer neural network model (Hummel & Biederman, 1992) that captures these fundamental properties of human object recognition is described. Given a line drawing of an object, the model activates a viewpoint-invariant structural description of the object, specifying its parts and their interrelations. Visual priming is interpreted as a change in the connection weights for the activation of: a) cells, termed geon feature assemblies (GFAs), that conjoin the output of units that represent invariant, independent properties of a single geon and its relations (such as its type, aspect ratio, relations to other geons), or b) a change in the connection weights by which several GFAs activate a cell representing an object.
从现象学角度来看,人类形状识别似乎不会因深度方向的变化(直至部分遮挡)、视野中的位置以及大小而改变。模板理论的最新版本(例如,Ullman,1989;Lowe,1987)假定,这些不变性是通过应用诸如旋转、平移和缩放图像等变换来实现的,以便能够将其与存储的模板进行度量匹配。据推测,此类变换的执行需要时间。我们描述了最近的启动实验,其中评估了图像的先前简短呈现对其后续识别的影响。这些实验的结果表明,这种不变性是完全的:视觉启动的程度(与名称或基本水平概念启动不同)不受位置、大小、深度方向的变化或图像中存在的特定线条和顶点的影响,只要相同组件的表征能够被激活。文中描述了一个已实现的七层神经网络模型(Hummel和Biederman,1992),该模型捕捉了人类物体识别的这些基本特性。给定一个物体的线条图,该模型会激活该物体的视点不变结构描述,指定其部分及其相互关系。视觉启动被解释为激活以下内容的连接权重的变化:a)称为geon特征组件(GFA)的细胞,这些细胞结合了代表单个geon的不变、独立属性及其关系(如类型、纵横比、与其他geon的关系)的单元的输出,或者b)几个GFA激活代表物体的细胞的连接权重的变化。