IEEE Trans Image Process. 2015 Apr;24(4):1424-35. doi: 10.1109/TIP.2015.2403231. Epub 2015 Feb 12.
In this paper, we propose an approach to learn hierarchical features for visual object tracking. First, we offline learn features robust to diverse motion patterns from auxiliary video sequences. The hierarchical features are learned via a two-layer convolutional neural network. Embedding the temporal slowness constraint in the stacked architecture makes the learned features robust to complicated motion transformations, which is important for visual object tracking. Then, given a target video sequence, we propose a domain adaptation module to online adapt the pre-learned features according to the specific target object. The adaptation is conducted in both layers of the deep feature learning module so as to include appearance information of the specific target object. As a result, the learned hierarchical features can be robust to both complicated motion transformations and appearance changes of target objects. We integrate our feature learning algorithm into three tracking methods. Experimental results demonstrate that significant improvement can be achieved using our learned hierarchical features, especially on video sequences with complicated motion transformations.
在本文中,我们提出了一种学习视觉目标跟踪分层特征的方法。首先,我们从辅助视频序列中离线学习对各种运动模式具有鲁棒性的特征。分层特征是通过两层卷积神经网络学习得到的。在堆叠结构中嵌入时间缓慢约束使得学习到的特征对复杂的运动变换具有鲁棒性,这对于视觉目标跟踪很重要。然后,给定一个目标视频序列,我们提出了一种域自适应模块,可以根据特定的目标对象在线自适应地预学习的特征。在深度特征学习模块的两层中进行自适应,以包含特定目标对象的外观信息。因此,学习到的分层特征可以对复杂的运动变换和目标对象的外观变化具有鲁棒性。我们将我们的特征学习算法集成到三种跟踪方法中。实验结果表明,使用我们学习到的分层特征可以显著提高跟踪效果,尤其是在具有复杂运动变换的视频序列上。