School of Artificial Intelligence, South China Normal University, Foshan 528225, China.
School of Computer Science, South China Normal University, Guangzhou 510631, China.
Neural Netw. 2024 Sep;177:106382. doi: 10.1016/j.neunet.2024.106382. Epub 2024 May 9.
Occluded person re-identification (Re-ID) is a challenging task, as pedestrians are often obstructed by various occlusions, such as non-pedestrian objects or non-target pedestrians. Previous methods have heavily relied on auxiliary models to obtain information in unoccluded regions, such as human pose estimation. However, these auxiliary models fall short in accounting for pedestrian occlusions, thereby leading to potential misrepresentations. In addition, some previous works learned feature representations from single images, ignoring the potential relations among samples. To address these issues, this paper introduces a Multi-Level Relation-Aware Transformer (MLRAT) model for occluded person Re-ID. This model mainly encompasses two novel modules: Patch-Level Relation-Aware (PLRA) and Sample-Level Relation-Aware (SLRA). PLRA learns fine-grained local features by modeling the structural relations between key patches, bypassing the dependency on auxiliary models. It adopts a model-free method to select key patches that have high semantic correlation with the final pedestrian representation. In particular, to alleviate the interference of occlusion, PLRA captures the structural relations among key patches via a two-layer Graph Convolution Network (GCN), effectively guiding the local feature fusion and learning. SLRA is designed to facilitate the model to learn discriminative features by modeling the relations among samples. Specifically, to mitigate noisy relations of irrelevant samples, we present a Relation-Aware Transformer (RAT) block to capture the relations among neighbors. Furthermore, to bridge the gap between training and testing phases, a self-distillation method is employed to transfer the sample-level relations captured by SLRA to the backbone. Extensive experiments are conducted on four occluded datasets, two partial datasets and two holistic datasets. The results show that the proposed MLRAT model significantly outperforms existing baselines on four occluded datasets, while maintains top performance on two partial datasets and two holistic datasets.
遮挡行人重识别(Re-ID)是一项具有挑战性的任务,因为行人经常会被各种遮挡物遮挡,例如非行人目标或非目标行人。以前的方法主要依赖于辅助模型来获取未遮挡区域的信息,例如人体姿态估计。然而,这些辅助模型在处理行人遮挡方面存在不足,从而导致潜在的表示错误。此外,一些之前的工作从单张图像中学习特征表示,忽略了样本之间的潜在关系。为了解决这些问题,本文提出了一种用于遮挡行人重识别的多水平关系感知 Transformer(MLRAT)模型。该模型主要包含两个新颖的模块:Patch-Level Relation-Aware(PLRA)和 Sample-Level Relation-Aware(SLRA)。PLRA 通过对关键补丁之间的结构关系进行建模,学习细粒度的局部特征,而无需依赖辅助模型。它采用无模型的方法选择与最终行人表示具有高语义相关性的关键补丁。特别是,为了减轻遮挡的干扰,PLRA 通过两层图卷积网络(GCN)捕捉关键补丁之间的结构关系,有效地指导局部特征融合和学习。SLRA 旨在通过对样本之间的关系进行建模,帮助模型学习具有鉴别力的特征。具体来说,为了减轻不相关样本的噪声关系,我们提出了一种关系感知 Transformer(RAT)块来捕捉邻居之间的关系。此外,为了弥合训练和测试阶段之间的差距,采用自蒸馏方法将 SLRA 捕获的样本级关系转移到骨干网络中。在四个遮挡数据集、两个部分数据集和两个整体数据集上进行了广泛的实验。实验结果表明,所提出的 MLRAT 模型在四个遮挡数据集上显著优于现有的基线,同时在两个部分数据集和两个整体数据集上保持了领先的性能。