IEEE Trans Pattern Anal Mach Intell. 2021 May;43(5):1467-1482. doi: 10.1109/TPAMI.2019.2952590. Epub 2021 Apr 1.
Visual Active Tracking (VAT) aims at following a target object by autonomously controlling the motion system of a tracker given visual observations. To learn a robust tracker for VAT, in this article, we propose a novel adversarial reinforcement learning (RL) method which adopts an Asymmetric Dueling mechanism, referred to as AD-VAT. In the mechanism, the tracker and target, viewed as two learnable agents, are opponents and can mutually enhance each other during the dueling/competition: i.e., the tracker intends to lockup the target, while the target tries to escape from the tracker. The dueling is asymmetric in that the target is additionally fed with the tracker's observation and action, and learns to predict the tracker's reward as an auxiliary task. Such an asymmetric dueling mechanism produces a stronger target, which in turn induces a more robust tracker. To improve the performance of the tracker in the case of challenging scenarios such as obstacles, we employ more advanced environment augmentation technique and two-stage training strategies, termed as AD-VAT+. For a better understanding of the asymmetric dueling mechanism, we also analyze the target's behaviors as the training proceeds and visualize the latent space of the tracker. The experimental results, in both 2D and 3D environments, demonstrate that the proposed method leads to a faster convergence in training and yields more robust tracking behaviors in different testing scenarios. The potential of the active tracker is also shown in real-world videos.
视觉主动跟踪 (VAT) 旨在通过自主控制跟踪器的运动系统,根据视觉观察来跟踪目标对象。为了学习用于 VAT 的稳健跟踪器,本文提出了一种新颖的对抗性强化学习 (RL) 方法,称为 AD-VAT。在该机制中,跟踪器和目标被视为两个可学习的代理,它们是对手,可以在决斗/竞争中相互增强:即,跟踪器试图锁定目标,而目标试图从跟踪器中逃脱。决斗是不对称的,因为目标还额外地接收跟踪器的观察和动作,并学习预测跟踪器的奖励作为辅助任务。这种不对称的决斗机制产生了一个更强的目标,从而诱导出一个更稳健的跟踪器。为了提高跟踪器在具有挑战性的场景(如障碍物)中的性能,我们采用了更先进的环境增强技术和两阶段训练策略,称为 AD-VAT+。为了更好地理解不对称决斗机制,我们还分析了目标在训练过程中的行为,并可视化了跟踪器的潜在空间。在 2D 和 3D 环境中的实验结果表明,所提出的方法在训练中具有更快的收敛速度,并在不同的测试场景中产生更稳健的跟踪行为。主动跟踪器的潜力也在真实世界的视频中得到了展示。