Li Yinlin, Wu Wei, Zhang Bo, Li Fengfu
State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences Beijing, China.
Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing, China.
Front Comput Neurosci. 2015 Oct 7;9:123. doi: 10.3389/fncom.2015.00123. eCollection 2015.
In recent years, the interdisciplinary research between neuroscience and computer vision has promoted the development in both fields. Many biologically inspired visual models are proposed, and among them, the Hierarchical Max-pooling model (HMAX) is a feedforward model mimicking the structures and functions of V1 to posterior inferotemporal (PIT) layer of the primate visual cortex, which could generate a series of position- and scale- invariant features. However, it could be improved with attention modulation and memory processing, which are two important properties of the primate visual cortex. Thus, in this paper, based on recent biological research on the primate visual cortex, we still mimic the first 100-150 ms of visual cognition to enhance the HMAX model, which mainly focuses on the unsupervised feedforward feature learning process. The main modifications are as follows: (1) To mimic the attention modulation mechanism of V1 layer, a bottom-up saliency map is computed in the S1 layer of the HMAX model, which can support the initial feature extraction for memory processing; (2) To mimic the learning, clustering and short-term memory to long-term memory conversion abilities of V2 and IT, an unsupervised iterative clustering method is used to learn clusters with multiscale middle level patches, which are taken as long-term memory; (3) Inspired by the multiple feature encoding mode of the primate visual cortex, information including color, orientation, and spatial position are encoded in different layers of the HMAX model progressively. By adding a softmax layer at the top of the model, multiclass categorization experiments can be conducted, and the results on Caltech101 show that the enhanced model with a smaller memory size exhibits higher accuracy than the original HMAX model, and could also achieve better accuracy than other unsupervised feature learning methods in multiclass categorization task.
近年来,神经科学与计算机视觉之间的跨学科研究推动了这两个领域的发展。人们提出了许多受生物启发的视觉模型,其中,分层最大池化模型(HMAX)是一种前馈模型,它模仿了灵长类动物视觉皮层从V1到后颞下叶(PIT)层的结构和功能,能够生成一系列位置和尺度不变特征。然而,它可以通过注意力调制和记忆处理来改进,而注意力调制和记忆处理是灵长类动物视觉皮层的两个重要特性。因此,在本文中,基于对灵长类动物视觉皮层的最新生物学研究,我们仍然模仿视觉认知的前100 - 150毫秒来增强HMAX模型,该模型主要关注无监督的前馈特征学习过程。主要修改如下:(1)为了模仿V1层的注意力调制机制,在HMAX模型的S1层计算一个自下而上的显著图,它可以支持用于记忆处理的初始特征提取;(2)为了模仿V2和IT的学习、聚类以及从短期记忆到长期记忆的转换能力,使用一种无监督的迭代聚类方法来学习具有多尺度中层补丁的聚类,这些聚类被视为长期记忆;(3)受灵长类动物视觉皮层的多特征编码模式启发,包括颜色、方向和空间位置的信息在HMAX模型的不同层中逐步编码。通过在模型顶部添加一个softmax层,可以进行多类分类实验,在Caltech101上的结果表明,具有较小内存大小的增强模型比原始HMAX模型表现出更高的准确率,并且在多类分类任务中也能比其他无监督特征学习方法获得更好的准确率。