Zhu Jiawen, Chen Xin, Diao Haiwen, Li Shuai, He Jun-Yan, Li Chenyang, Luo Bin, Wang Dong, Lu Huchuan
IEEE Trans Neural Netw Learn Syst. 2025 Aug;36(8):15502-15514. doi: 10.1109/TNNLS.2025.3545752.
The speed-precision tradeoff is a critical problem in visual object tracking, as it typically requires low latency and is deployed on resource-constrained platforms. Existing solutions for efficient tracking primarily focus on lightweight backbones or modules, which, however, come at a sacrifice in precision. In this article, inspired by dynamic network routing, we propose DyTrack, a dynamic transformer framework for efficient tracking. Real-world tracking scenarios exhibit varying levels of complexity. We argue that a simple network is sufficient for easy video frames, while more computational resources should be assigned to difficult ones. DyTrack automatically learns to configure proper reasoning routes for different inputs, thereby improving the utilization of the available computational budget and achieving higher performance at the same running speed. We formulate instance-specific tracking as a sequential decision problem and incorporate terminating branches to intermediate layers of the model. Furthermore, we propose a feature recycling mechanism to maximize computational efficiency by reusing the outputs of predecessors. Additionally, a target-aware self-distillation strategy is designed to enhance the discriminating capabilities of early-stage predictions by mimicking the representation patterns of the deep model. Extensive experiments demonstrate that DyTrack achieves promising speed-precision tradeoffs with only a single model. For instance, DyTrack obtains 64.9% area under the curve (AUC) on LaSOT with a speed of 256 fps.