Rochester Institute of Technology, Rochester, NY 14623, USA.
Sensors (Basel). 2020 Jan 19;20(2):547. doi: 10.3390/s20020547.
In recent years, deep learning-based visual object trackers have achieved state-of-the-art performance on several visual object tracking benchmarks. However, most tracking benchmarks are focused on ground level videos, whereas aerial tracking presents a new set of challenges. In this paper, we compare ten trackers based on deep learning techniques on four aerial datasets. We choose top performing trackers utilizing different approaches, specifically tracking by detection, discriminative correlation filters, Siamese networks and reinforcement learning. In our experiments, we use a subset of OTB2015 dataset with aerial style videos; the UAV123 dataset without synthetic sequences; the UAV20L dataset, which contains 20 long sequences; and DTB70 dataset as our benchmark datasets. We compare the advantages and disadvantages of different trackers in different tracking situations encountered in aerial data. Our findings indicate that the trackers perform significantly worse in aerial datasets compared to standard ground level videos. We attribute this effect to smaller target size, camera motion, significant camera rotation with respect to the target, out of view movement, and clutter in the form of occlusions or similar looking distractors near tracked object.
近年来,基于深度学习的视觉目标跟踪器在多个视觉目标跟踪基准上取得了最先进的性能。然而,大多数跟踪基准都集中在地面视频上,而空中跟踪则带来了一系列新的挑战。在本文中,我们在四个空中数据集上比较了基于深度学习技术的十种跟踪器。我们选择了利用不同方法的表现最佳的跟踪器,具体包括基于检测的跟踪、判别相关滤波器、孪生网络和强化学习。在我们的实验中,我们使用了具有空中风格视频的 OTB2015 数据集的子集、没有合成序列的 UAV123 数据集、包含 20 个长序列的 UAV20L 数据集以及 DTB70 数据集作为基准数据集。我们比较了不同跟踪器在遇到空中数据的不同跟踪情况下的优缺点。我们的发现表明,与标准地面视频相比,跟踪器在空中数据集上的性能明显下降。我们将这种效果归因于目标较小、相机运动、相对于目标的相机旋转较大、目标移出视野以及被跟踪物体附近的遮挡或类似干扰物造成的杂乱。