Zhao Dongxuan, Li Yunsong, Li Jiaxing, Duan Xiping, Ma Ning, Wang Yuhe
School of Computer Science and Information Engineering, Harbin Normal University, No. 1 Shida Road, Limin Economic Development Zone, Harbin 150025, China.
Sensors (Basel). 2025 May 20;25(10):3214. doi: 10.3390/s25103214.
To enhance the tracking performance of transformer-based trackers in complex scenes, we propose a novel visual object tracking method that incorporates three key components: a pyramid channel attention mechanism, a hierarchical cross-attention structure, and an attention-guided multi-layer perceptron. The pyramid channel attention mechanism dynamically enhances informative feature channels across different scales, while the hierarchical cross-attention structure facilitates effective feature interaction. The attention-guided multi-layer perceptron introduces nonlinear transformations under attention guidance to improve feature representation. Experimental results on benchmark datasets demonstrate the superior performance of the proposed method.
为了提高基于Transformer的跟踪器在复杂场景中的跟踪性能,我们提出了一种新颖的视觉目标跟踪方法,该方法包含三个关键组件:金字塔通道注意力机制、分层交叉注意力结构和注意力引导的多层感知器。金字塔通道注意力机制动态增强不同尺度上的信息性特征通道,而分层交叉注意力结构促进有效的特征交互。注意力引导的多层感知器在注意力引导下引入非线性变换以改善特征表示。在基准数据集上的实验结果证明了所提方法的卓越性能。