IEEE Trans Cybern. 2015 Nov;45(11):2612-24. doi: 10.1109/TCYB.2014.2377196. Epub 2014 Dec 18.
Recently, many computational models have been proposed to simulate visual cognition process. For example, the hierarchical Max-Pooling (HMAX) model was proposed according to the hierarchical and bottom-up structure of V1 to V4 in the ventral pathway of primate visual cortex, which could achieve position- and scale-tolerant recognition. In our previous work, we have introduced memory and association into the HMAX model to simulate visual cognition process. In this paper, we improve our theoretical framework by mimicking a more elaborate structure and function of the primate visual cortex. We will mainly focus on the new formation of memory and association in visual processing under different circumstances as well as preliminary cognition and active adjustment in the inferior temporal cortex, which are absent in the HMAX model. The main contributions of this paper are: 1) in the memory and association part, we apply deep convolutional neural networks to extract various episodic features of the objects since people use different features for object recognition. Moreover, to achieve a fast and robust recognition in the retrieval and association process, different types of features are stored in separated clusters and the feature binding of the same object is stimulated in a loop discharge manner and 2) in the preliminary cognition and active adjustment part, we introduce preliminary cognition to classify different types of objects since distinct neural circuits in a human brain are used for identification of various types of objects. Furthermore, active cognition adjustment of occlusion and orientation is implemented to the model to mimic the top-down effect in human cognition process. Finally, our model is evaluated on two face databases CAS-PEAL-R1 and AR. The results demonstrate that our model exhibits its efficiency on visual recognition process with much lower memory storage requirement and a better performance compared with the traditional purely computational methods.
最近,许多计算模型被提出来模拟视觉认知过程。例如,分层最大池化(HMAX)模型是根据灵长类动物视觉皮层腹侧通路中 V1 到 V4 的分层和自下而上的结构提出的,它可以实现位置和尺度的鲁棒识别。在我们之前的工作中,我们已经在 HMAX 模型中引入了记忆和联想来模拟视觉认知过程。在本文中,我们通过模拟灵长类动物视觉皮层更精细的结构和功能来改进我们的理论框架。我们将主要关注在不同情况下视觉处理中记忆和联想的新形成,以及在下颞叶皮层中缺失的初步认知和主动调整。本文的主要贡献是:1)在记忆和联想部分,我们应用深度卷积神经网络来提取对象的各种情景特征,因为人们使用不同的特征来识别对象。此外,为了在检索和联想过程中实现快速和鲁棒的识别,不同类型的特征存储在分离的簇中,并且以循环放电的方式刺激同一对象的特征绑定;2)在初步认知和主动调整部分,我们引入初步认知来对不同类型的对象进行分类,因为人脑中有不同的神经回路用于识别各种类型的对象。此外,对模型实施遮挡和方向的主动认知调整,以模拟人类认知过程中的自上而下的效应。最后,我们的模型在两个人脸数据库 CAS-PEAL-R1 和 AR 上进行了评估。结果表明,与传统的纯计算方法相比,我们的模型在视觉识别过程中表现出更高的效率,需要的存储要求更低,性能更好。