IEEE Trans Cybern. 2022 Nov;52(11):12150-12162. doi: 10.1109/TCYB.2021.3070677. Epub 2022 Oct 17.
Recently, the correlation filter (CF) and Siamese network have become the two most popular frameworks in object tracking. Existing CF trackers, however, are limited by feature learning and context usage, making them sensitive to boundary effects. In contrast, Siamese trackers can easily suffer from the interference of semantic distractors. To address the above problems, we propose an end-to-end target-insight correlation network (TICNet) for object tracking, which aims at breaking the above limitations on top of a unified network. TICNet is an asymmetric dual-branch network involving a target-background awareness model (TBAM), a spatial-channel attention network (SCAN), and a distractor-aware filter (DAF) for end-to-end learning. Specifically, TBAM aims to distinguish a target from the background in the pixel level, yielding a target likelihood map based on color statistics to mine distractors for DAF learning. SCAN consists of a basic convolutional network, a channel-attention network, and a spatial-attention network, aiming to generate attentive weights to enhance the representation learning of the tracker. Especially, we formulate a differentiable DAF and employ it as a learnable layer in the network, thus helping suppress distracting regions in the background. During testing, DAF, together with TBAM, yields a response map for the final target estimation. Extensive experiments on seven benchmarks demonstrate that TICNet outperforms the state-of-the-art methods while running at real-time speed.
最近,相关滤波器 (CF) 和孪生网络已成为目标跟踪中最流行的两个框架。然而,现有的 CF 跟踪器受到特征学习和上下文使用的限制,使其对边界效应敏感。相比之下,孪生网络很容易受到语义干扰物的干扰。为了解决上述问题,我们提出了一种用于目标跟踪的端到端目标洞察相关网络 (TICNet),旨在在统一的网络上打破上述限制。TICNet 是一个非对称的双分支网络,包括一个目标-背景感知模型 (TBAM)、一个空间-通道注意力网络 (SCAN) 和一个干扰感知滤波器 (DAF),用于端到端学习。具体来说,TBAM 旨在在像素级区分目标和背景,根据颜色统计生成目标可能性图,以挖掘干扰物供 DAF 学习。SCAN 由一个基本卷积网络、一个通道注意力网络和一个空间注意力网络组成,旨在生成注意力权重,以增强跟踪器的表示学习。特别是,我们构建了一个可微分的 DAF,并将其作为网络中的一个可学习层,从而有助于抑制背景中的干扰区域。在测试时,DAF 与 TBAM 一起生成最终目标估计的响应图。在七个基准上的广泛实验表明,TICNet 在实时速度下的表现优于最先进的方法。