Liang Tianyi, Li Baopu, Wang Mengzhu, Tan Huibin, Luo Zhigang
IEEE Trans Image Process. 2023;32:267-280. doi: 10.1109/TIP.2022.3227814. Epub 2022 Dec 21.
Unifying object detection and re-identification (ReID) into a single network enables faster multi-object tracking (MOT), while this multi-task setting poses challenges for training. In this work, we dissect the joint training of detection and ReID from two dimensions: label assignment and loss function. We find previous works generally overlook them and directly borrow the practices from object detection, inevitably causing inferior performance. Specifically, we identify a qualified label assignment for MOT should: 1) have the assignment cost aware of ReID cost, not just detection cost; 2) provide sufficient positive samples for robust feature learning while avoiding ambiguous positives (i.e., the positives shared by different ground-truth objects). To achieve the above goals, we first propose Identity-aware Label Assignment, which jointly considers the assignment cost of detection and ReID to select positive samples for each instance without ambiguities. Moreover, we advance a novel Discriminative Focal Loss that integrates ReID predictions with Focal Loss to focus the training on the discriminative samples. Finally, we upgrade the strong baseline FairMOT with our techniques and achieve up to 7.0 MOTA / 54.1% IDs improvements on MOT16/17/20 benchmarks under favorable inference speed, which verifies our tailored label assignment and loss function for MOT are superior to those inherited from object detection.
将目标检测和重新识别(ReID)统一到单个网络中能够实现更快的多目标跟踪(MOT),然而这种多任务设置给训练带来了挑战。在这项工作中,我们从标签分配和损失函数这两个维度剖析检测和ReID的联合训练。我们发现先前的工作通常忽略了它们,而是直接借鉴目标检测的做法,这不可避免地导致性能不佳。具体而言,我们确定对于MOT来说,一个合格的标签分配应该:1)使分配成本意识到ReID成本,而不仅仅是检测成本;2)为鲁棒的特征学习提供足够的正样本,同时避免模糊的正样本(即不同真实物体共享的正样本)。为了实现上述目标,我们首先提出身份感知标签分配,它联合考虑检测和ReID的分配成本,为每个实例选择无歧义的正样本。此外,我们提出了一种新颖的判别式焦点损失,将ReID预测与焦点损失相结合,使训练聚焦于判别性样本。最后,我们用我们的技术升级强大的基线FairMOT,并在有利的推理速度下在MOT16/17/20基准上实现高达7.0 MOTA / 54.1%的ID提升,这验证了我们为MOT量身定制的标签分配和损失函数优于从目标检测继承的那些。