Zhang Jianming, Huang Benben, Ye Zi, Kuang Li-Dan, Ning Xin
School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, 410114, China.
Hunan Provincial Key Laboratory of Intelligent Processing of Big Data On Transportation, Changsha University of Science and Technology, Changsha, 410114, China.
Sci Rep. 2021 Nov 25;11(1):22908. doi: 10.1038/s41598-021-02095-4.
Recently, object trackers based on Siamese networks have attracted considerable attentions due to their remarkable tracking performance and widespread application. Especially, the anchor-based methods exploit the region proposal subnetwork to get accurate prediction of a target and make great performance improvement. However, those trackers cannot capture the spatial information very well and the pre-defined anchors will hinder robustness. To solve these problems, we propose a Siamese-based anchor-free object tracking algorithm with multiscale spatial attentions in this paper. Firstly, we take ResNet-50 as the backbone network to generate multiscale features of both template patch and search regions. Secondly, we propose the spatial attention extraction (SAE) block to capture the spatial information among all positions in the template and search region feature maps. Thirdly, we put these features into the SAE block to get the multiscale spatial attentions. Finally, an anchor-free classification and regression subnetwork is used for predicting the location of the target. Unlike anchor-based methods, our tracker directly predicts the target position without predefined parameters. Extensive experiments with state-of-the-art trackers are carried out on four challenging visual object tracking benchmarks: OTB100, UAV123, VOT2016 and GOT-10k. Those experimental results confirm the effectiveness of our proposed tracker.
近年来,基于暹罗网络的目标跟踪器因其卓越的跟踪性能和广泛的应用而备受关注。特别是,基于锚点的方法利用区域提议子网来获得目标的准确预测,并取得了显著的性能提升。然而,这些跟踪器不能很好地捕捉空间信息,并且预定义的锚点会阻碍鲁棒性。为了解决这些问题,我们在本文中提出了一种基于暹罗网络的无锚点目标跟踪算法,该算法具有多尺度空间注意力。首先,我们采用ResNet-50作为骨干网络来生成模板补丁和搜索区域的多尺度特征。其次,我们提出了空间注意力提取(SAE)模块来捕捉模板和搜索区域特征图中所有位置之间的空间信息。第三,我们将这些特征输入到SAE模块中以获得多尺度空间注意力。最后,使用一个无锚点的分类和回归子网来预测目标的位置。与基于锚点的方法不同,我们的跟踪器直接预测目标位置,无需预定义参数。我们在四个具有挑战性的视觉目标跟踪基准数据集上进行了广泛的实验,与当前最先进的跟踪器进行对比,这四个数据集分别是OTB100、UAV123、VOT2016和GOT-10k。这些实验结果证实了我们提出的跟踪器的有效性。