基于卷积神经网络特征和自适应模型更新的ELDA跟踪器增强

Enhancement of ELDA Tracker Based on CNN Features and Adaptive Model Update.

作者信息

Gao Changxin, Shi Huizhang, Yu Jin-Gang, Sang Nong

机构信息

National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Automation, Huazhong University of Science and Technology, Wuhan 430074, China.

Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68503, USA.

出版信息

Sensors (Basel). 2016 Apr 15;16(4):545. doi: 10.3390/s16040545.

DOI:10.3390/s16040545

PMID:27092505

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4851059/

Abstract

Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the "good" models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm.

摘要

外观表示和观测模型是为基于视频的传感器设计鲁棒视觉跟踪算法的最重要组成部分。此外，基于样本的线性判别分析（ELDA）模型在目标跟踪中表现出良好的性能。基于此，我们通过深度卷积神经网络（CNN）特征和自适应模型更新来改进ELDA跟踪算法。深度CNN特征已成功应用于各种计算机视觉任务。在所有候选窗口上提取CNN特征非常耗时。为了解决这个问题，提出了一种两步CNN特征提取方法，即分别计算卷积层和全连接层。由于CNN特征和基于样本的模型具有很强的判别能力，我们同时更新目标模型和背景模型，以提高它们的适应性，并处理判别能力和适应性之间的权衡。提出了一种目标更新方法来选择“好”的模型（检测器），这些模型具有很强的判别能力且与其他选定模型不相关。同时，我们将背景模型构建为高斯混合模型（GMM）以适应复杂场景，该模型在离线时初始化并在线更新。所提出的跟踪器在包含各种挑战的50个视频序列的基准数据集上进行了评估。它在比较的现有最先进跟踪器中实现了最佳的整体性能，这证明了我们跟踪算法的有效性和鲁棒性。