Fiser J, Biederman I, Cooper E E
Department of Psychology and Computer Science, University of Southern California, Los Angeles 90089-2520, USA.
Spat Vis. 1996;10(3):237-71. doi: 10.1163/156856896x00150.
A number of recent successful models of face recognition posit only two layers, an input layer consisting of a lattice of spatial filters and a single subsequent stage by which those descriptor values are mapped directly onto an object representation layer by standard matching methods such as stochastic optimization. Is this approach sufficient for modeling human object recognition? We tested whether a highly efficient version of such a two-layer model would manifest effects similar to those shown by humans when given the task of recognizing images of objects that had been employed in a series of psychophysical experiments. System accuracy was quite high overall, but was qualitatively different from that evidenced by humans in object recognition tasks. The discrepancy between the system's performance and human performance is likely to be revealed by all models that map filter values directly onto object units. These results suggest that human object recognition (as opposed to face recognition) may be difficult to approximate by models that do not posit hidden units for explicit representation of intermediate entities such as edges, viewpoint invariant classifiers, axes, shocks and object parts.
近期一些成功的人脸识别模型仅设置了两层,一层是由空间滤波器网格构成的输入层,另一层是随后的单个阶段,通过随机优化等标准匹配方法将这些描述符值直接映射到对象表示层。这种方法足以对人类的对象识别进行建模吗?我们测试了这种高效的两层模型在执行识别一系列心理物理学实验中所使用的对象图像任务时,是否会表现出与人类相似的效果。总体而言,系统的准确率相当高,但在质量上与人类在对象识别任务中所表现出的准确率有所不同。所有将滤波器值直接映射到对象单元的模型可能都会揭示出系统性能与人类性能之间的差异。这些结果表明,对于那些没有设置隐藏单元来明确表示诸如边缘、视角不变分类器、轴、冲击和对象部分等中间实体的模型而言,可能难以近似人类的对象识别(与人脸识别相对)。