Murray Joseph F, Kreutz-Delgado Kenneth
Massachusetts Institute of Technology, Brain and Cognitive Sciences Department, Cambridge, MA 02139, USA.
Neural Comput. 2007 Sep;19(9):2301-52. doi: 10.1162/neco.2007.19.9.2301.
We present a hierarchical architecture and learning algorithm for visual recognition and other visual inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Using properties of biological vision for guidance, we posit a stochastic generative world model and from it develop a simplified world model (SWM) based on a tractable variational approximation that is designed to enforce sparse coding. Recent developments in computational methods for learning overcomplete representations (Lewicki & Sejnowski, 2000; Teh, Welling, Osindero, & Hinton, 2003) suggest that overcompleteness can be useful for visual tasks, and we use an overcomplete dictionary learning algorithm (Kreutz-Delgado, et al., 2003) as a preprocessing stage to produce accurate, sparse codings of images. Inference is performed by constructing a dynamic multilayer network with feedforward, feedback, and lateral connections, which is trained to approximate the SWM. Learning is done with a variant of the back-propagation-through-time algorithm, which encourages convergence to desired states within a fixed number of iterations. Vision tasks require large networks, and to make learning efficient, we take advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Experiments on a set of rotated objects demonstrate various types of visual inference and show that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.
我们提出了一种用于视觉识别以及其他视觉推理任务(如图像想象、遮挡图像重建和期望驱动分割)的分层架构和学习算法。以生物视觉特性为指导,我们构建了一个随机生成世界模型,并在此基础上基于一种易于处理的变分近似开发了一个简化世界模型(SWM),该近似旨在强制实现稀疏编码。用于学习过完备表示的计算方法的最新进展(Lewicki & Sejnowski,2000;Teh、Welling、Osindero & Hinton,2003)表明,过完备性对于视觉任务可能是有用的,并且我们使用一种过完备字典学习算法(Kreutz-Delgado等人,2003)作为预处理阶段来生成图像的准确稀疏编码。通过构建一个具有前馈、反馈和侧向连接的动态多层网络来执行推理,该网络经过训练以近似SWM。学习使用时间反向传播算法的一个变体来完成,该算法鼓励在固定的迭代次数内收敛到期望状态。视觉任务需要大型网络,为了使学习高效,我们利用每一层的稀疏性,在每次迭代时仅更新大权重矩阵中的一小部分元素。对一组旋转物体进行的实验展示了各种类型的视觉推理,并表明增加过完备程度可提高在存在杂乱遮挡物体的困难场景中的识别性能。