Li Yanping, Liu Yizhang, Zhang Hongyun, Zhao Cairong, Wei Zhihua, Miao Duoqian
IEEE Trans Image Process. 2024;33:3200-3211. doi: 10.1109/TIP.2024.3393360. Epub 2024 May 6.
Person re-identification (ReID) typically encounters varying degrees of occlusion in real-world scenarios. While previous methods have addressed this using handcrafted partitions or external cues, they often compromise semantic information or increase network complexity. In this paper, we propose a new method from a novel perspective, termed as OAT. Specifically, we first use a Transformer backbone with multiple class tokens for diverse pedestrian feature learning. Given that the self-attention mechanism in the Transformer solely focuses on low-level feature correlations, neglecting higher-order relations among different body parts or regions. Thus, we propose the Second-Order Attention (SOA) module to capture more comprehensive features. To address computational efficiency, we further derive approximation formulations for implementing second-order attention. Observing that the importance of semantics associated with different class tokens varies due to the uncertainty of the location and size of occlusion, we propose the Entropy Guided Fusion (EGF) module for multiple class tokens. By conducting uncertainty analysis on each class token, higher weights are assigned to those with lower information entropy, while lower weights are assigned to class tokens with higher entropy. The dynamic weight adjustment can mitigate the impact of occlusion-induced uncertainty on feature learning, thereby facilitating the acquisition of discriminative class token representations. Extensive experiments have been conducted on occluded and holistic person re-identification datasets, which demonstrate the effectiveness of our proposed method.
行人重识别(ReID)在现实场景中通常会遇到不同程度的遮挡。虽然先前的方法使用手工划分或外部线索来解决这个问题,但它们往往会损害语义信息或增加网络复杂性。在本文中,我们从一个新颖的角度提出了一种新方法,称为OAT。具体来说,我们首先使用带有多个类别令牌的Transformer主干进行多样化的行人特征学习。鉴于Transformer中的自注意力机制仅关注低级特征相关性,而忽略了不同身体部位或区域之间的高阶关系。因此,我们提出了二阶注意力(SOA)模块来捕获更全面的特征。为了解决计算效率问题,我们进一步推导了用于实现二阶注意力的近似公式。观察到由于遮挡位置和大小的不确定性,与不同类别令牌相关的语义重要性会有所不同,我们为多个类别令牌提出了熵引导融合(EGF)模块。通过对每个类别令牌进行不确定性分析,将较高的权重分配给信息熵较低的令牌,而将较低的权重分配给信息熵较高的类别令牌。动态权重调整可以减轻遮挡引起的不确定性对特征学习的影响,从而有助于获得有区分力的类别令牌表示。我们在遮挡和整体行人重识别数据集上进行了广泛的实验,结果证明了我们提出的方法的有效性。