School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China.
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China.
Neural Netw. 2021 Oct;142:316-328. doi: 10.1016/j.neunet.2021.05.003. Epub 2021 May 15.
Recently, tracking models based on bounding box regression (such as region proposal networks), built on the Siamese network, have attracted much attention. Despite their promising performance, these trackers are less effective in perceiving the target information in the following two aspects. First, existing regression models cannot take a global view of a large-scale target since the effective receptive field of a neuron is too small to cover the target with a large scale. Second, the neurons with a fixed receptive field (RF) size in these models cannot adapt to the scale and aspect ratio changes of the target. In this paper, we propose an adaptive ensemble perception tracking framework to address these issues. Specifically, we first construct a per-pixel prediction model, which predicts the target state at each pixel of the correlated feature. On top of the per-pixel prediction model, we then develop a confidence-guided ensemble prediction mechanism. The ensemble mechanism adaptively fuses the predictions of multiple pixels with the guidance of confidence maps, which enlarges the perception range and enhances the adaptive perception ability at the object-level. In addition, we introduce a receptive field adaption model to enhance the adaptive perception ability at the neuron-level, which adjusts the RF by adaptively integrating the features with different RFs. Extensive experimental results on the VOT2018, VOT2016, UAV123, LaSOT, and TC128 datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of accuracy and speed.
最近,基于边界框回归(如区域提议网络)的跟踪模型,构建在孪生网络上,引起了广泛关注。尽管这些跟踪器性能很有前景,但在以下两个方面,它们对目标信息的感知能力较差。首先,现有回归模型不能全局观察大目标,因为神经元的有效感受野太小,无法覆盖大尺度的目标。其次,这些模型中具有固定感受野(RF)大小的神经元无法适应目标的尺度和纵横比变化。在本文中,我们提出了一种自适应集成感知跟踪框架来解决这些问题。具体来说,我们首先构建了一个逐像素预测模型,该模型预测相关特征中每个像素的目标状态。在逐像素预测模型的基础上,我们开发了一种置信度引导的集成预测机制。该集成机制自适应地融合多个像素的预测,并在置信图的指导下,扩大感知范围,增强对象级别的自适应感知能力。此外,我们引入了一个感受野自适应模型,通过自适应地整合具有不同 RF 的特征来增强神经元级别的自适应感知能力。在 VOT2018、VOT2016、UAV123、LaSOT 和 TC128 数据集上的广泛实验结果表明,所提出的算法在准确性和速度方面均优于最先进的方法。