学习如何使用深度架构进行图像跟踪。

Learning where to attend with deep architectures for image tracking.

机构信息

University of British Columbia, Vancouver, BC V6G 1Z4, Canada.

出版信息

Neural Comput. 2012 Aug;24(8):2151-84. doi: 10.1162/NECO_a_00312. Epub 2012 Apr 17.

Abstract

We discuss an attentional model for simultaneous object tracking and recognition that is driven by gaze data. Motivated by theories of perception, the model consists of two interacting pathways, identity and control, intended to mirror the what and where pathways in neuroscience models. The identity pathway models object appearance and performs classification using deep (factored)-restricted Boltzmann machines. At each point in time, the observations consist of foveated images, with decaying resolution toward the periphery of the gaze. The control pathway models the location, orientation, scale, and speed of the attended object. The posterior distribution of these states is estimated with particle filtering. Deeper in the control pathway, we encounter an attentional mechanism that learns to select gazes so as to minimize tracking uncertainty. Unlike in our previous work, we introduce gaze selection strategies that operate in the presence of partial information and on a continuous action space. We show that a straightforward extension of the existing approach to the partial information setting results in poor performance, and we propose an alternative method based on modeling the reward surface as a gaussian process. This approach gives good performance in the presence of partial information and allows us to expand the action space from a small, discrete set of fixation points to a continuous domain.

摘要

我们讨论了一种基于注视数据的同时目标跟踪和识别的注意力模型。受感知理论的启发,该模型由两个相互作用的路径组成,即身份和控制,旨在反映神经科学模型中的“什么”和“哪里”路径。身份路径模型使用深度(因子化)受限玻尔兹曼机来模拟物体的外观并进行分类。在每个时间点,观察结果由注视中心的高分辨率图像和边缘的低分辨率图像组成。控制路径模型的位置、方向、比例和注视物体的速度。这些状态的后验分布使用粒子滤波进行估计。在控制路径的更深处,我们遇到了一种注意力机制,它学会了选择注视点,以最小化跟踪不确定性。与我们之前的工作不同,我们引入了在存在部分信息和连续动作空间的情况下进行注视选择的策略。我们表明,将现有方法直接扩展到部分信息设置会导致性能不佳,我们提出了一种基于将奖励曲面建模为高斯过程的替代方法。这种方法在存在部分信息的情况下表现良好,并允许我们将动作空间从一个小的、离散的固定点集扩展到一个连续的域。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索