Biederman I, Kalocsai P
University of Southern California, Department of Psychology and Neuroscience Program, Los Angeles 90089-2520, USA.
Philos Trans R Soc Lond B Biol Sci. 1997 Aug 29;352(1358):1203-19. doi: 10.1098/rstb.1997.0103.
A number of behavioural phenomena distinguish the recognition of faces and objects, even when members of a set of objects are highly similar. Because faces have the same parts in approximately the same relations, individuation of faces typically requires specification of the metric variation in a holistic and integral representation of the facial surface. The direct mapping of a hypercolumn-like pattern of activation onto a representation layer that preserves relative spatial filter values in a two-dimensional (2D) coordinate space, as proposed by C. von der Malsburg and his associates, may account for many of the phenomena associated with face recognition. An additional refinement, in which each column of filters (termed a 'jet') is centred on a particular facial feature (or fiducial point), allows selectivity of the input into the holistic representation to avoid incorporation of occluding or nearby surfaces. The initial hypercolumn representation also characterizes the first stage of object perception, but the image variation for objects at a given location in a 2D coordinate space may be too great to yield sufficient predictability directly from the output of spatial kernels. Consequently, objects can be represented by a structural description specifying qualitative (typically, non-accidental) characterizations of an object's parts, the attributes of the parts, and the relations among the parts, largely based on orientation and depth discontinuities (as shown by Hummel & Biederman). A series of experiments on the name priming or physical matching of complementary images (in the Fourier domain) of objects and faces documents that whereas face recognition is strongly dependent on the original spatial filter values, evidence from object recognition indicates strong invariance to these values, even when distinguishing among objects that are as similar as faces.
即使一组物体中的成员非常相似,一些行为现象也能区分面部识别和物体识别。由于面部具有大致相同关系的相同部分,面部个体化通常需要在面部表面的整体和完整表示中指定度量变化。正如C.冯·德·马尔斯堡及其同事所提出的,将类似超柱的激活模式直接映射到一个表示层上,该表示层在二维(2D)坐标空间中保留相对空间滤波器值,这可能解释了许多与面部识别相关的现象。一种额外的改进是,每个滤波器列(称为“喷流”)以特定的面部特征(或基准点)为中心,这样可以使输入到整体表示中的信息具有选择性,以避免纳入遮挡或附近的表面。最初的超柱表示也表征了物体感知的第一阶段,但在2D坐标空间中给定位置的物体的图像变化可能太大,无法直接从空间核的输出中产生足够的可预测性。因此,物体可以用一种结构描述来表示,该描述主要基于方向和深度不连续性,指定物体各部分的定性(通常是非偶然的)特征、部分的属性以及部分之间的关系(如哈默尔和比德曼所示)。一系列关于物体和面部的互补图像(在傅里叶域中)的名称启动或物理匹配的实验表明,面部识别强烈依赖于原始空间滤波器值,而物体识别的证据表明,即使在区分与面部一样相似的物体时,对这些值也具有很强的不变性。