Ji Haoxuanye, Wang Le, Zhou Sanping, Tang Wei, Hua Gang
IEEE Trans Image Process. 2024;33:5144-5158. doi: 10.1109/TIP.2024.3456008. Epub 2024 Sep 19.
Unsupervised person re-identification (Re-ID) is challenging due to the lack of ground truth labels. Most existing methods employ iterative clustering to generate pseudo labels for unlabeled training data to guide the learning process. However, how to select samples that are both associated with high-confidence pseudo labels and hard (discriminative) enough remains a critical problem. To address this issue, a disentangled sample guidance learning (DSGL) method is proposed for unsupervised Re-ID. The method consists of disentangled sample mining (DSM) and discriminative feature learning (DFL). DSM disentangles (unlabeled) person images into identity-relevant and identity-irrelevant factors, which are used to construct disentangled positive/negative groups that contain discriminative enough information. DFL incorporates the mined disentangled sample groups into model training by a surrogate disentangled learning loss and a disentangled second-order similarity regularization, to help the model better distinguish the characteristics of different persons. By using the DSGL training strategy, the mAP on Market-1501 and MSMT17 increases by 6.6% and 10.1% when applying the ResNet50 framework, and by 0.6% and 6.9% with the vision transformer (VIT) framework, respectively, validating the effectiveness of the DSGL method. Moreover, DSGL surpasses previous state-of-the-art methods by achieving higher Top-1 accuracy and mAP on the Market-1501, MSMT17, PersonX, and VeRi-776 datasets. The source code for this paper is available at https://github.com/jihaoxuanye/DiseSGL.
无监督行人重识别(Re-ID)由于缺乏真实标签而具有挑战性。大多数现有方法采用迭代聚类为未标记的训练数据生成伪标签,以指导学习过程。然而,如何选择既与高置信度伪标签相关又足够难(有区分性)的样本仍然是一个关键问题。为了解决这个问题,本文提出了一种用于无监督Re-ID的解缠样本引导学习(DSGL)方法。该方法由解缠样本挖掘(DSM)和判别特征学习(DFL)组成。DSM将(未标记的)行人图像解缠为与身份相关和与身份无关的因素,用于构建包含足够区分性信息的解缠正/负组。DFL通过替代解缠学习损失和解缠二阶相似性正则化将挖掘出的解缠样本组纳入模型训练,以帮助模型更好地区分不同人的特征。通过使用DSGL训练策略,在应用ResNet50框架时,Market-1501和MSMT17上的mAP分别提高了6.6%和10.1%,在使用视觉Transformer(VIT)框架时分别提高了0.6%和6.9%,验证了DSGL方法的有效性。此外,DSGL在Market-1501、MSMT17、PersonX和VeRi-776数据集上实现了更高的Top-1准确率和mAP,超过了先前的最先进方法。本文的源代码可在https://github.com/jihaoxuanye/DiseSGL获取。