Lu Xiankai, Ma Chao, Shen Jianbing, Yang Xiaokang, Reid Ian, Yang Ming-Hsuan
IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2386-2401. doi: 10.1109/TPAMI.2020.3041332. Epub 2022 Apr 1.
In this paper, we address the issue of data imbalance in learning deep models for visual object tracking. Although it is well known that data distribution plays a crucial role in learning and inference models, considerably less attention has been paid to data imbalance in visual tracking. For the deep regression trackers that directly learn a dense mapping from input images of target objects to soft response maps, we identify their performance is limited by the extremely imbalanced pixel-to-pixel differences when computing regression loss. This prevents existing end-to-end learnable deep regression trackers from performing as well as discriminative correlation filters (DCFs) trackers. For the deep classification trackers that draw positive and negative samples to learn discriminative classifiers, there exists heavy class imbalance due to a limited number of positive samples when compared to the number of negative samples. To balance training data, we propose a novel shrinkage loss to penalize the importance of easy training data mostly coming from the background, which facilitates both deep regression and classification trackers to better distinguish target objects from the background. We extensively validate the proposed shrinkage loss function on six benchmark datasets, including the OTB-2013, OTB-2015, UAV-123, VOT-2016, VOT-2018 and LaSOT. Equipped with our shrinkage loss, the proposed one-stage deep regression tracker achieves favorable results against state-of-the-art methods, especially in comparison with DCFs trackers. Meanwhile, our shrinkage loss generalizes well to deep classification trackers. When replacing the original binary cross entropy loss with our shrinkage loss, three representative baseline trackers achieve large performance gains, even setting new state-of-the-art results.
在本文中,我们探讨了用于视觉目标跟踪的深度模型学习中的数据不平衡问题。尽管众所周知数据分布在学习和推理模型中起着至关重要的作用,但视觉跟踪中的数据不平衡问题却很少受到关注。对于直接从目标对象的输入图像学习到软响应图的深度回归跟踪器,我们发现其性能受到计算回归损失时像素间极度不平衡差异的限制。这使得现有的端到端可学习深度回归跟踪器无法达到判别相关滤波器(DCF)跟踪器的性能。对于通过抽取正负样本学习判别分类器的深度分类跟踪器,与负样本数量相比,由于正样本数量有限,存在严重的类别不平衡。为了平衡训练数据,我们提出了一种新颖的收缩损失,以惩罚主要来自背景的简单训练数据的重要性,这有助于深度回归和分类跟踪器更好地将目标对象与背景区分开来。我们在六个基准数据集上广泛验证了所提出的收缩损失函数,包括OTB - 2013、OTB - 2015、UAV - 123、VOT - 2016、VOT - 2018和LaSOT。配备了我们的收缩损失,所提出的单阶段深度回归跟踪器相对于现有方法取得了良好的结果,特别是与DCF跟踪器相比。同时,我们的收缩损失在深度分类跟踪器中也具有良好的通用性。当用我们的收缩损失替换原始的二元交叉熵损失时,三个有代表性的基线跟踪器取得了显著的性能提升,甚至创造了新的最优结果。