Yu Zhi, Huang Zhiyong, Hou Mingyang, Pei Jiaming, Yan Yan, Liu Yushi, Sun Daming
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China; Key Laboratory of Dependable Service Computing in Cyber Physical Society (Chongqing University), Ministry of Education of China, Chongqing University, Chongqing, 400044, China.
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China.
Neural Netw. 2025 Jul;187:107394. doi: 10.1016/j.neunet.2025.107394. Epub 2025 Mar 20.
Recently, transformer-based methods have shown remarkable success in object re-identification. However, most works directly embed off-the-shelf transformer backbones for feature extraction. These methods treat all patch tokens equally, ignoring the difference of distinct patch tokens for feature representation. To solve this issue, this paper designs a feature-tuning mechanism for transformer backbones to emphasize important patches and attenuate unimportant patches. Specifically, a Feature-tuning Hierarchical Transformer (FHTrans) for object re-identification is proposed. First, we propose a plug-and-play Feature-tuning module via Token Communication (TCF) deployed within transformer encoder blocks. This module regards the class token as a pivot to achieve communication between patch tokens. Important patch tokens are emphasized, while unimportant patch tokens are attenuated, focusing more precisely on the discriminative features related to object distinction. Then, we construct a FHTrans based on the designed feature-tuning module. The encoder blocks are divided into three hierarchies considering the correlation between feature representativeness and transformer depth. As the hierarchy deepens, the communication between tokens becomes tighter. This enables the model to capture more crucial feature information. Finally, we propose a Sample Aggregation (SA) loss to impose more effective constraints on statistical characteristics among samples, thereby enhancing intra-class aggregation and guiding FHTrans to learn more discriminative features. Experiments on object re-identification benchmarks demonstrate that our method can achieve state-of-the-art performance.
最近,基于Transformer的方法在目标重识别中取得了显著成功。然而,大多数工作直接嵌入现成的Transformer主干进行特征提取。这些方法平等对待所有补丁令牌,忽略了不同补丁令牌在特征表示上的差异。为了解决这个问题,本文为Transformer主干设计了一种特征调整机制,以强调重要补丁并弱化不重要的补丁。具体来说,提出了一种用于目标重识别的特征调整分层Transformer(FHTrans)。首先,我们通过部署在Transformer编码器块内的令牌通信(TCF)提出了一种即插即用的特征调整模块。该模块将类别令牌作为枢纽来实现补丁令牌之间的通信。重要的补丁令牌得到强调,而不重要的补丁令牌则被弱化,更精确地聚焦于与目标区分相关的判别特征。然后,我们基于设计的特征调整模块构建了FHTrans。考虑到特征代表性与Transformer深度之间的相关性,将编码器块分为三个层次。随着层次的加深,令牌之间的通信变得更加紧密。这使模型能够捕获更关键的特征信息。最后,我们提出了一种样本聚合(SA)损失,对样本之间的统计特征施加更有效的约束,从而增强类内聚合并引导FHTrans学习更具判别力的特征。在目标重识别基准上的实验表明,我们的方法可以实现当前最优的性能。