Vetter T, Poggio T, Bülthoff H H
Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge 02139.
Curr Biol. 1994 Jan 1;4(1):18-23. doi: 10.1016/s0960-9822(00)00004-x.
Human observers can recognize three-dimensional objects seen in novel orientations, even when they have previously seen only a relatively small number of different views of the object. How our visual system does this is a key problem in vision research. Recent theories and experiments suggest that the human visual system might store a relatively small number of sample two-dimensional views of a three-dimensional object, and recognize novel views by a process of interpolation between the stored sample views. These sample views may be collected during a training phase as the visual system familiarizes itself with the object.
Here, we investigate whether constraints on the shapes of objects commonly encountered in the real world can reduce the number of training views required for recognition of three-dimensional objects. We are particularly concerned with the constraint of object symmetry. We show that if an object is bilaterally symmetrical, then additional 'virtual views' can automatically be generated from one sample view by symmetry transformations. These virtual views should make it more easy to recognize novel views of a symmetric than an asymmetric object, when a single sample view has been seen. Recognition should be particularly facilitated when the novel views are close to the virtual view. We present psychophysical results that bear out these predictions.
Our results show that the human visual system can indeed exploit symmetry to facilitate object recognition, and support the model for object recognition in which a small number of two-dimensional views are remembered and combined to recognize novel views of the same object. These results raise questions about how symmetry is recognized, and symmetry transformations implemented, in real, biological neural networks.
人类观察者能够识别以新颖方向呈现的三维物体,即使他们之前仅见过该物体相对较少数量的不同视图。我们的视觉系统如何做到这一点是视觉研究中的一个关键问题。最近的理论和实验表明,人类视觉系统可能存储三维物体相对较少数量的二维样本视图,并通过在存储的样本视图之间进行插值的过程来识别新颖视图。这些样本视图可能在训练阶段收集,此时视觉系统熟悉该物体。
在此,我们研究现实世界中常见物体形状的约束是否可以减少识别三维物体所需的训练视图数量。我们特别关注物体对称性的约束。我们表明,如果一个物体是双侧对称的,那么通过对称变换可以从一个样本视图自动生成额外的“虚拟视图”。当只看到一个样本视图时,这些虚拟视图应该会使识别对称物体的新颖视图比识别不对称物体的新颖视图更容易。当新颖视图接近虚拟视图时,识别应该会特别容易。我们展示了证实这些预测的心理物理学结果。
我们的结果表明,人类视觉系统确实可以利用对称性来促进物体识别,并支持这样一种物体识别模型,即记住少量二维视图并将它们组合起来以识别同一物体的新颖视图。这些结果引发了关于在真实的生物神经网络中如何识别对称性以及如何实现对称变换的问题。