Logothetis N K, Pauls J, Bülthoff H H, Poggio T
Division of Neuroscience, Baylor College of Medicine, Houston, Texas 77030.
Curr Biol. 1994 May 1;4(5):401-14. doi: 10.1016/s0960-9822(00)00089-0.
How do we recognize visually perceived three-dimensional objects, particularly when they are seen from novel view-points? Recent psychophysical studies have suggested that the human visual system may store a relatively small number of two-dimensional views of a three-dimensional object, recognizing novel views of the object by interpolation between the stored sample views. In order to investigate the neural mechanisms underlying this process, physiological experiments are required and, as a prelude to such experiments, we have been interested to know whether the observations made with human observers extend to monkeys.
We trained monkeys to recognize computer-generated images of objects presented from an arbitrarily chosen training view and containing sufficient three-dimensional information to specify the object's structure. We subsequently tested the trained monkeys' ability to generalize recognition of the object to views generated by rotation of the target object around any arbitrary axis. The monkeys recognized as the target only those two-dimensional views that were close to the familiar, training view. Recognition became increasingly difficult for the monkeys as the stimulus was rotated away from the experienced viewpoint, and failed for views farther than about 40 degrees from the training view. This suggests that, in the early stages of learning to recognize a previously unfamiliar object, the monkeys build two-dimensional, viewer-centered object representations, rather than a three-dimensional model of the object. When the animals were trained with as few as three views of the object, 120 degrees apart, they could often recognize all the views of the object resulting from rotations around the same axis.
Our experiments show that recognition of three-dimensional novel objects is a function of the object's retinal projection. This suggests that non-human primates, like humans, may accomplish view-invariant recognition of familiar objects by a viewer-centered system that interpolates between a small number of stored views. The measures of recognition performance can be simulated by a regularization network that stores a few familiar views, and is endowed with the ability to interpolate between these views. Our results provide the basis for physiological studies of object-recognition by monkeys and suggest that the insights gained from such studies should apply also to humans.
我们如何识别视觉感知到的三维物体,尤其是当从新的视角观察它们时?最近的心理物理学研究表明,人类视觉系统可能存储相对少量的三维物体的二维视图,通过在存储的样本视图之间进行插值来识别物体的新视图。为了研究这一过程背后的神经机制,需要进行生理学实验,作为此类实验的前奏,我们一直想知道人类观察者的观察结果是否也适用于猴子。
我们训练猴子识别从任意选择的训练视角呈现的计算机生成的物体图像,这些图像包含足够的三维信息来确定物体的结构。随后,我们测试了经过训练的猴子将物体识别推广到目标物体绕任意轴旋转所生成的视图的能力。猴子只将那些与熟悉的训练视图接近的二维视图识别为目标。随着刺激从经验视角旋转开,猴子的识别变得越来越困难,对于距离训练视图超过约40度的视图则无法识别。这表明,在学习识别以前不熟悉的物体的早期阶段,猴子构建的是以观察者为中心的二维物体表征,而不是物体的三维模型。当动物仅用相隔120度的三个物体视图进行训练时,它们通常能够识别绕同一轴旋转产生的物体的所有视图。
我们的实验表明,对三维新物体的识别是物体视网膜投影的函数。这表明,与人类一样,非人类灵长类动物可能通过一个以观察者为中心的系统来实现对熟悉物体的视图不变识别,该系统在少量存储的视图之间进行插值。识别性能的测量可以通过一个正则化网络来模拟,该网络存储一些熟悉的视图,并具备在这些视图之间进行插值的能力。我们的结果为猴子物体识别的生理学研究提供了基础,并表明从这类研究中获得的见解也应该适用于人类。