IEEE Trans Image Process. 2021;30:5287-5298. doi: 10.1109/TIP.2021.3082298. Epub 2021 Jun 2.
In recent years, person re-identification (re-ID) has achieved relatively good performance, benefiting from the revival of deep neural networks. However, due to the existence of domain bias which refers to the different data distributions between two domains, it remains challenging to directly deploy a model trained on a labeled source domain to a target domain only with unlabeled data available. In this paper, a Self-Training with Progressive Representation Enhancement (PREST) framework, which comprises a multi-scale self-training method and a view-invariant representation learning module, is proposed to promote re-ID performance on the target domain in an unsupervised manner. More specifically, multi-scale representations, including the global body and local parts of pedestrian images, are utilized to obtain pseudo-labels. Then, some images are selected according to the pseudo-labels to create a new dataset for supervising the fine-tuning process, which is operated iteratively to progressively promote the performance. Furthermore, to mitigate the influence of different styles among sub-domains, in cases where a single sub-domain is captured by one camera, a classifier with a gradient reverse layer is first employed to learn view-invariant representation for pedestrian images with the same identity taken by different cameras; this can further enhance the reliability of the predicted labels and improve the cross-domain re-ID performance. Extensive experiments on three large-scale re-ID datasets demonstrate that our framework achieves significantly better performance than existing approaches.
近年来,由于深度神经网络的复兴,行人重识别(re-ID)技术取得了相对较好的性能。然而,由于存在域偏差,即两个域之间的数据分布不同,因此直接将在有标签的源域上训练的模型部署到只有未标记数据的目标域仍然具有挑战性。在本文中,提出了一种自训练与渐进式表示增强(PREST)框架,该框架包括多尺度自训练方法和视图不变表示学习模块,用于在无监督的情况下促进目标域上的 re-ID 性能。更具体地说,利用多尺度表示,包括行人图像的全局身体和局部部分,来获取伪标签。然后,根据伪标签选择一些图像来创建一个新的数据集,用于监督微调过程,该过程通过迭代操作来逐步提升性能。此外,为了减轻子域之间不同风格的影响,在单个子域由单个摄像机捕获的情况下,首先使用带有梯度反向层的分类器来学习不同摄像机拍摄的同一身份行人图像的视图不变表示;这可以进一步提高预测标签的可靠性,提高跨域 re-ID 性能。在三个大规模 re-ID 数据集上的广泛实验表明,我们的框架实现了显著优于现有方法的性能。