School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430200, China; Hubei Luojia Laboratory, Wuhan 430200, China.
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
Neural Netw. 2024 Aug;176:106380. doi: 10.1016/j.neunet.2024.106380. Epub 2024 May 8.
Most trackers formulate visual tracking as common classification and regression (i.e., bounding box regression) tasks. Correlation features that are computed through depth-wise convolution or channel-wise multiplication operations are input into both the classification and regression branches for inference. However, this matching computation with the linear correlation method tends to lose semantic features and obtain only a local optimum. Moreover, these trackers use an unreliable ranking based on the classification score and the intersection over union (IoU) loss for the regression training, thus degrading the tracking performance. In this paper, we introduce a deformable transformer model, which effectively computes the correlation features of the training and search sets. A new loss called the quality-aware focal loss (QAFL) is used to train the classification network; it efficiently alleviates the inconsistency between the classification and localization quality predictions. We use a new regression loss called α-GIoU to train the regression network, and it effectively improves localization accuracy. To further improve the tracker's robustness, the candidate object location is predicted by using a combination of online learning scores with a transformer-assisted framework and classification scores. An extensive experiment on six testing datasets demonstrates the effectiveness of our method. In particular, the proposed method attains a success score of 71.7% on the OTB-2015 dataset and an AUC score of 67.3% on the NFS30 dataset, respectively.
大多数跟踪器将视觉跟踪公式化为常见的分类和回归(即边界框回归)任务。通过深度卷积或通道乘法运算计算的相关特征被输入到分类和回归分支中进行推理。然而,这种与线性相关方法的匹配计算往往会丢失语义特征,并且只能获得局部最优解。此外,这些跟踪器使用基于分类得分和交并比(IoU)损失的不可靠排名来进行回归训练,从而降低了跟踪性能。在本文中,我们引入了一种可变形的转换器模型,它可以有效地计算训练集和搜索集的相关特征。使用一种称为质量感知焦点损失(QAFL)的新损失来训练分类网络;它有效地缓解了分类和定位质量预测之间的不一致性。我们使用一种新的回归损失称为α-GIoU 来训练回归网络,它可以有效地提高定位精度。为了进一步提高跟踪器的鲁棒性,使用在线学习得分与基于转换器的框架和分类得分的组合来预测候选对象的位置。在六个测试数据集上的广泛实验证明了我们方法的有效性。特别是,所提出的方法在 OTB-2015 数据集上的成功率达到了 71.7%,在 NFS30 数据集上的 AUC 得分达到了 67.3%。