Brain and Intelligent Systems Research Laboratory, Department of Electrical and Computer Engineering, Shahid Rajaee Teacher Training University Tehran, Iran ; School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM) Tehran, Iran ; Department of Physiology, Monash University Melbourne, VIC, Australia.
Brain and Intelligent Systems Research Laboratory, Department of Electrical and Computer Engineering, Shahid Rajaee Teacher Training University Tehran, Iran ; School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM) Tehran, Iran ; Department of Electrical Engineering, Amirkabir University of Technology Tehran, Iran.
Front Comput Neurosci. 2014 Jul 18;8:74. doi: 10.3389/fncom.2014.00074. eCollection 2014.
Invariant object recognition is a remarkable ability of primates' visual system that its underlying mechanism has constantly been under intense investigations. Computational modeling is a valuable tool toward understanding the processes involved in invariant object recognition. Although recent computational models have shown outstanding performances on challenging image databases, they fail to perform well in image categorization under more complex image variations. Studies have shown that making sparse representation of objects by extracting more informative visual features through a feedforward sweep can lead to higher recognition performances. Here, however, we show that when the complexity of image variations is high, even this approach results in poor performance compared to humans. To assess the performance of models and humans in invariant object recognition tasks, we built a parametrically controlled image database consisting of several object categories varied in different dimensions and levels, rendered from 3D planes. Comparing the performance of several object recognition models with human observers shows that only in low-level image variations the models perform similar to humans in categorization tasks. Furthermore, the results of our behavioral experiments demonstrate that, even under difficult experimental conditions (i.e., briefly presented masked stimuli with complex image variations), human observers performed outstandingly well, suggesting that the models are still far from resembling humans in invariant object recognition. Taken together, we suggest that learning sparse informative visual features, although desirable, is not a complete solution for future progresses in object-vision modeling. We show that this approach is not of significant help in solving the computational crux of object recognition (i.e., invariant object recognition) when the identity-preserving image variations become more complex.
不变目标识别是灵长类视觉系统的一项非凡能力,其潜在机制一直受到强烈关注。计算建模是理解不变目标识别所涉及过程的一种有价值的工具。尽管最近的计算模型在具有挑战性的图像数据库上表现出色,但在更复杂的图像变化下的图像分类中表现不佳。研究表明,通过前馈扫描提取更具信息量的视觉特征来对物体进行稀疏表示,可以提高识别性能。然而,我们在这里表明,即使在图像变化复杂性较高的情况下,这种方法也会导致识别性能不如人类。为了评估模型和人类在不变目标识别任务中的表现,我们构建了一个参数化控制的图像数据库,其中包含了几个物体类别,这些类别在不同的维度和水平上变化,由 3D 平面渲染而成。比较几种目标识别模型与人类观察者的性能表明,只有在低级别的图像变化中,模型在分类任务中才与人类表现相似。此外,我们的行为实验结果表明,即使在困难的实验条件下(即,具有复杂图像变化的短暂呈现的掩蔽刺激),人类观察者也表现出色,这表明模型在不变目标识别方面仍远未达到人类的水平。总之,我们认为学习稀疏的信息丰富的视觉特征虽然是可取的,但对于未来的物体视觉建模进展来说并不是一个完整的解决方案。我们表明,当保持物体身份的图像变化变得更加复杂时,这种方法对于解决目标识别的计算核心(即不变目标识别)并没有很大的帮助。