Wang Shilei, Wang Zhenhua, Sun Qianqian, Cheng Gong, Ning Jifeng
IEEE Trans Image Process. 2024;33:5073-5085. doi: 10.1109/TIP.2024.3453028. Epub 2024 Sep 17.
Recently, one-stream trackers have achieved parallel feature extraction and relation modeling through the exploitation of Transformer-based architectures. This design greatly improves the performance of trackers. However, as one-stream trackers often overlook crucial tracking cues beyond the template, they prone to give unsatisfactory results against complex tracking scenarios. To tackle these challenges, we propose a multi-cue single-stream tracker, dubbed MCTrack here, which seamlessly integrates template information, historical trajectory, historical frame, and the search region for synchronized feature extraction and relation modeling. To achieve this, we employ two types of encoders to convert the template, historical frames, search region, and historical trajectory into tokens, which are then collectively fed into a Transformer architecture. To distill temporal and spatial cues, we introduce a novel adaptive update mechanism, which incorporates a thresholding component and a local multi-peak component to filter out less accurate and overly disturbed tracking cues. Empirically, MCTrack achieves leading performance on mainstream benchmark datasets, surpassing the most advanced SeqTrack by 2.0% in terms of the AO metric on GOT-10k. The code is available at https://github.com/wsumel/MCTrack.
最近,单流跟踪器通过利用基于Transformer的架构实现了并行特征提取和关系建模。这种设计大大提高了跟踪器的性能。然而,由于单流跟踪器经常忽略模板之外的关键跟踪线索,在面对复杂的跟踪场景时,它们容易给出不尽人意的结果。为应对这些挑战,我们提出了一种多线索单流跟踪器,在此称为MCTrack,它无缝集成了模板信息、历史轨迹、历史帧和搜索区域,以进行同步特征提取和关系建模。为实现这一点,我们使用两种类型的编码器将模板、历史帧、搜索区域和历史轨迹转换为令牌,然后将这些令牌一起输入到Transformer架构中。为了提取时空线索,我们引入了一种新颖的自适应更新机制,该机制包含一个阈值组件和一个局部多峰组件,以滤除不太准确和受干扰过大的跟踪线索。根据经验,MCTrack在主流基准数据集上取得了领先性能,在GOT-10k数据集上,就平均重叠率(AO)指标而言,比最先进的SeqTrack高出2.0%。代码可在https://github.com/wsumel/MCTrack获取。