Li Xin, Pei Wenjie, Wang Yaowei, He Zhenyu, Lu Huchuan, Yang Ming-Hsuan
IEEE Trans Neural Netw Learn Syst. 2024 Jul;35(7):9186-9197. doi: 10.1109/TNNLS.2022.3231537. Epub 2024 Jul 10.
While deep-learning-based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training. To eliminate expensive and exhaustive annotation, we study self-supervised (SS) learning for visual tracking. In this work, we develop the crop-transform-paste operation, which is able to synthesize sufficient training data by simulating various appearance variations during tracking, including appearance variations of objects and background interference. Since the target state is known in all synthesized data, existing deep trackers can be trained in routine ways using the synthesized data without human annotation. The proposed target-aware data-synthesis method adapts existing tracking approaches within a SS learning framework without algorithmic changes. Thus, the proposed SS learning mechanism can be seamlessly integrated into existing tracking frameworks to perform training. Extensive experiments show that our method: 1) achieves favorable performance against supervised (Su) learning schemes under the cases with limited annotations; 2) helps deal with various tracking challenges such as object deformation, occlusion (OCC), or background clutter (BC) due to its manipulability; 3) performs favorably against the state-of-the-art unsupervised tracking methods; and 4) boosts the performance of various state-of-the-art Su learning frameworks, including SiamRPN++, DiMP, and TransT.
虽然基于深度学习的跟踪方法已经取得了显著进展,但它们需要大规模高质量的标注数据来进行充分训练。为了消除昂贵且详尽的标注,我们研究用于视觉跟踪的自监督(SS)学习。在这项工作中,我们开发了裁剪-变换-粘贴操作,该操作能够通过模拟跟踪过程中的各种外观变化来合成足够的训练数据,包括物体的外观变化和背景干扰。由于在所有合成数据中目标状态都是已知的,现有的深度跟踪器可以使用合成数据以常规方式进行训练,无需人工标注。所提出的目标感知数据合成方法在不改变算法的情况下,在SS学习框架内适配现有的跟踪方法。因此,所提出的SS学习机制可以无缝集成到现有的跟踪框架中进行训练。大量实验表明,我们的方法:1)在标注有限的情况下,相对于有监督(Su)学习方案取得了良好的性能;2)由于其可操作性,有助于应对各种跟踪挑战,如物体变形、遮挡(OCC)或背景杂乱(BC);3)相对于当前最先进的无监督跟踪方法表现良好;4)提升了包括SiamRPN++、DiMP和TransT在内的各种当前最先进的Su学习框架的性能。