IEEE Trans Image Process. 2022;31:1882-1894. doi: 10.1109/TIP.2022.3148876. Epub 2022 Feb 16.
Recently, siamese-based trackers have achieved significant successes. However, those trackers are restricted by the difficulty of learning consistent feature representation with the object. To address the above challenge, this paper proposes a novel siamese implicit region proposal network with compound attention for visual tracking. First, an implicit region proposal (IRP) module is designed by combining a novel pixel-wise correlation method. This module can aggregate feature information of different regions that are similar to the pre-defined anchor boxes in Region Proposal Network. To this end, the adaptive feature receptive fields then can be obtained by linear fusion of features from different regions. Second, a compound attention module including a channel and non-local attention is raised to assist the IRP module to perform a better perception of the scale and shape of the object. The channel attention is applied for mining the discriminative information of the object to handle the background clutters of the template, while non-local attention is trained to aggregate the contextual information to learn the semantic range of the object. Finally, experimental results demonstrate that the proposed tracker achieves state-of-the-art performance on six challenging benchmark tests, including VOT-2018, VOT-2019, OTB-100, GOT-10k, LaSOT, and TrackingNet. Further, our obtained results demonstrate that the proposed approach can be run at an average speed of 72 FPS in real time.
最近,基于孪生网络的跟踪器取得了显著的成功。然而,这些跟踪器受到学习与目标一致的特征表示的困难的限制。为了解决上述挑战,本文提出了一种新颖的基于孪生网络的具有复合注意力的隐式区域建议网络用于视觉跟踪。首先,通过结合新颖的像素级相关方法设计了一个隐式区域建议 (IRP) 模块。该模块可以聚合与 Region Proposal Network 中预定义锚框相似的不同区域的特征信息。为此,通过对不同区域的特征进行线性融合,可以获得自适应的特征感受野。其次,提出了一种包括通道注意力和非局部注意力的复合注意力模块,以帮助 IRP 模块更好地感知目标的尺度和形状。通道注意力用于挖掘对象的判别信息以处理模板的背景杂波,而非局部注意力则用于聚合上下文信息以学习对象的语义范围。最后,实验结果表明,所提出的跟踪器在包括 VOT-2018、VOT-2019、OTB-100、GOT-10k、LaSOT 和 TrackingNet 在内的六个具有挑战性的基准测试中达到了最先进的性能。此外,我们的结果表明,所提出的方法可以在实时环境中以 72 FPS 的平均速度运行。