暹罗超光谱跟踪器：从基于RGB的跟踪器学习高光谱目标跟踪器

Liu Zhenqi, Wang Xinyu, Zhong Yanfei, Shu Meng, Sun Chen

IEEE Trans Image Process. 2022;31:7116-7129. doi: 10.1109/TIP.2022.3216995. Epub 2022 Nov 16.

Hyperspectral videos can provide the spatial, spectral, and motion information of targets, which makes it possible to track camouflaged targets that are similar to the background. However, hyperspectral object tracking is a challenging task, due to the huge hyperspectral video data dimension and the "data hungry" problem for the model training. Insufficient training data can seriously interfere with the accuracy and generalization of the tracking models. In this paper, a dual deep Siamese network framework for hyperspectral object tracking (SiamHYPER) is proposed for learning a hyperspectral tracker from a pretrained RGB tracker in the case of the "data hungry" problem. Specifically, in addition to a pretrained RGB-based Siamese tracker, a hyperspectral target-aware module is designed to mine the spectral information during the target prediction, and a spatial-spectral cross-attention module is introduced to further fuse the deep spatial and spectral features extracted from the RGB tracker and the hyperspectral target-aware module. Benefiting from the guidance training of the RGB tracker, a robust hyperspectral object tracker can be trained effectively with only a small number of hyperspectral video samples, to overcome the "data hungry" problem. In the experiments conducted in this study, the SiamHYPER framework was verified using SiamBAN and SiamRPN++, with 13 000 frames of hyperspectral videos for training, and achieved the best performance on the publicly available hyperspectral dataset released as part of the WHISPERS Hyperspectral Object Tracking Challenge. The area under the curve (AUC) of SiamHYPER was increased by nearly 8.9% and 7.2%, respectively, when compared with the current state-of-the-art RGB-based and hyperspectral trackers. In addition, the processing speed of SiamHYPER was 19 FPS, which is much higher than that of the current state-of-the-art hyperspectral trackers. The source code is available at zhenliuzhenqi/HOT: Hyperspectral object tracking (github.com).

高光谱视频可以提供目标的空间、光谱和运动信息，这使得跟踪与背景相似的伪装目标成为可能。然而，高光谱目标跟踪是一项具有挑战性的任务，这是由于高光谱视频数据维度巨大以及模型训练存在“数据饥渴”问题。训练数据不足会严重干扰跟踪模型的准确性和泛化能力。本文提出了一种用于高光谱目标跟踪的双深度暹罗网络框架（SiamHYPER），以便在“数据饥渴”问题的情况下从预训练的RGB跟踪器中学习高光谱跟踪器。具体来说，除了一个预训练的基于RGB的暹罗跟踪器外，还设计了一个高光谱目标感知模块，用于在目标预测过程中挖掘光谱信息，并引入了一个空间-光谱交叉注意力模块，以进一步融合从RGB跟踪器和高光谱目标感知模块中提取的深度空间和光谱特征。受益于RGB跟踪器的引导训练，仅使用少量高光谱视频样本就可以有效地训练出一个强大的高光谱目标跟踪器，从而克服“数据饥渴”问题。在本研究进行的实验中，使用SiamBAN和SiamRPN++对SiamHYPER框架进行了验证，使用13000帧高光谱视频进行训练，并在作为WHISPERS高光谱目标跟踪挑战赛一部分发布的公开可用高光谱数据集上取得了最佳性能。与当前基于RGB的和高光谱的最先进跟踪器相比，SiamHYPER的曲线下面积（AUC）分别提高了近8.9%和7.2%。此外，SiamHYPER的处理速度为19帧每秒，远高于当前最先进的高光谱跟踪器。源代码可在zhenliuzhenqi/HOT: Hyperspectral object tracking (github.com)获取。