Suppr超能文献

通过独立的目标无关检测和有效的暹罗跨任务交互实现鲁棒视觉目标跟踪

Toward Robust Visual Object Tracking With Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction.

作者信息

Xu Tianyang, Feng Zhenhua, Wu Xiao-Jun, Kittler Josef

出版信息

IEEE Trans Image Process. 2023;32:1541-1554. doi: 10.1109/TIP.2023.3246800. Epub 2023 Mar 6.

Abstract

Advanced Siamese visual object tracking architectures are jointly trained using pair-wise input images to perform target classification and bounding box regression. They have achieved promising results in recent benchmarks and competitions. However, the existing methods suffer from two limitations: First, though the Siamese structure can estimate the target state in an instance frame, provided the target appearance does not deviate too much from the template, the detection of the target in an image cannot be guaranteed in the presence of severe appearance variations. Second, despite the classification and regression tasks sharing the same output from the backbone network, their specific modules and loss functions are invariably designed independently, without promoting any interaction. Yet, in a general tracking task, the centre classification and bounding box regression tasks are collaboratively working to estimate the final target location. To address the above issues, it is essential to perform target-agnostic detection so as to promote cross-task interactions in a Siamese-based tracking framework. In this work, we endow a novel network with a target-agnostic object detection module to complement the direct target inference, and to avoid or minimise the misalignment of the key cues of potential template-instance matches. To unify the multi-task learning formulation, we develop a cross-task interaction module to ensure consistent supervision of the classification and regression branches, improving the synergy of different branches. To eliminate potential inconsistencies that may arise within a multi-task architecture, we assign adaptive labels, rather than fixed hard labels, to supervise the network training more effectively. The experimental results obtained on several benchmarks, i.e., OTB100, UAV123, VOT2018, VOT2019, and LaSOT, demonstrate the effectiveness of the advanced target detection module, as well as the cross-task interaction, exhibiting superior tracking performance as compared with the state-of-the-art tracking methods.

摘要

先进的连体视觉目标跟踪架构通过成对输入图像进行联合训练,以执行目标分类和边界框回归。它们在最近的基准测试和竞赛中取得了令人瞩目的成果。然而,现有方法存在两个局限性:第一,尽管连体结构可以在实例帧中估计目标状态,前提是目标外观与模板的偏差不太大,但在存在严重外观变化的情况下,无法保证在图像中检测到目标。第二,尽管分类和回归任务共享骨干网络的相同输出,但它们的特定模块和损失函数总是独立设计的,没有促进任何交互。然而,在一般的跟踪任务中,中心分类和边界框回归任务协同工作以估计最终目标位置。为了解决上述问题,在基于连体的跟踪框架中执行目标无关检测以促进跨任务交互至关重要。在这项工作中,我们为一个新颖的网络赋予了一个目标无关的目标检测模块,以补充直接目标推理,并避免或最小化潜在模板-实例匹配的关键线索的错位。为了统一多任务学习公式,我们开发了一个跨任务交互模块,以确保对分类和回归分支进行一致的监督,提高不同分支的协同作用。为了消除多任务架构中可能出现的潜在不一致性,我们分配自适应标签,而不是固定的硬标签,以更有效地监督网络训练。在几个基准测试(即OTB100、UAV123、VOT2018、VOT2019和LaSOT)上获得的实验结果证明了先进目标检测模块以及跨任务交互的有效性,与最先进的跟踪方法相比,展现出卓越的跟踪性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验