Yuan Di, Chang Xiaojun, Huang Po-Yao, Liu Qiao, He Zhenyu
IEEE Trans Image Process. 2021;30:976-985. doi: 10.1109/TIP.2020.3037518. Epub 2020 Dec 9.
The training of a feature extraction network typically requires abundant manually annotated training samples, making this a time-consuming and costly process. Accordingly, we propose an effective self-supervised learning-based tracker in a deep correlation framework (named: self-SDCT). Motivated by the forward-backward tracking consistency of a robust tracker, we propose a multi-cycle consistency loss as self-supervised information for learning feature extraction network from adjacent video frames. At the training stage, we generate pseudo-labels of consecutive video frames by forward-backward prediction under a Siamese correlation tracking framework and utilize the proposed multi-cycle consistency loss to learn a feature extraction network. Furthermore, we propose a similarity dropout strategy to enable some low-quality training sample pairs to be dropped and also adopt a cycle trajectory consistency loss in each sample pair to improve the training loss function. At the tracking stage, we employ the pre-trained feature extraction network to extract features and utilize a Siamese correlation tracking framework to locate the target using forward tracking alone. Extensive experimental results indicate that the proposed self-supervised deep correlation tracker (self-SDCT) achieves competitive tracking performance contrasted to state-of-the-art supervised and unsupervised tracking methods on standard evaluation benchmarks.
特征提取网络的训练通常需要大量人工标注的训练样本,这使得该过程既耗时又昂贵。因此,我们在深度相关框架中提出了一种有效的基于自监督学习的跟踪器(命名为:self-SDCT)。受鲁棒跟踪器前后向跟踪一致性的启发,我们提出了一种多周期一致性损失作为自监督信息,用于从相邻视频帧中学习特征提取网络。在训练阶段,我们在连体相关跟踪框架下通过前后向预测生成连续视频帧的伪标签,并利用所提出的多周期一致性损失来学习特征提取网络。此外,我们提出了一种相似性丢弃策略,以使一些低质量的训练样本对被丢弃,并且在每个样本对中采用周期轨迹一致性损失来改进训练损失函数。在跟踪阶段,我们使用预训练的特征提取网络来提取特征,并仅使用前向跟踪利用连体相关跟踪框架来定位目标。大量实验结果表明,与标准评估基准上的现有监督和无监督跟踪方法相比,所提出的自监督深度相关跟踪器(self-SDCT)实现了具有竞争力的跟踪性能。