Suppr超能文献

TE-TransReID:通过局部特征嵌入和轻量级Transformer实现高效行人重识别

TE-TransReID: Towards Efficient Person Re-Identification via Local Feature Embedding and Lightweight Transformer.

作者信息

Zhang Xiaoyu, Cai Rui, Jiang Ning, Xing Minwen, Xu Ke, Yang Huicheng, Zhu Wenbo, Hu Yaocong

机构信息

School of Electrical Engineering, Anhui Polytechnic University, Beijing Road No. 8, Wuhu 241000, China.

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.

出版信息

Sensors (Basel). 2025 Sep 3;25(17):5461. doi: 10.3390/s25175461.

Abstract

Person re-identification aims to match images of the same individual across non-overlapping cameras by analyzing personal characteristics. Recently, Transformer-based models have demonstrated excellent capabilities and achieved breakthrough progress in this task. However, their high computational costs and inadequate capacity to capture fine-grained local features impose significant constraints on re-identification performance. To address these challenges, this paper proposes a novel Toward Efficient Transformer-based Person Re-identification (TE-TransReID) framework. Specifically, the proposed framework retains only the former L-th layer layers of a pretrained Vision Transformer (ViT) for global feature extraction while combining local features extracted from a pretrained CNN, thus achieving the trade-off between high accuracy and lightweight networks. Additionally, we propose a dual efficient feature-fusion strategy to integrate global and local features for accurate person re-identification. The Efficient Token-based Feature-Fusion Module (ETFFM) employs the gate-based network to learn fused token-wise features, while the Efficient Patch-based Feature-Fusion Module (EPFFM) utilizes a lightweight Transformer to aggregate patch-level features. Finally, TE-TransReID achieves a rank-1 of 94.8%, 88.3%, and 85.7% on Market1501, DukeMTMC, and MSMT17 with a parameter of 27.5 M, respectively. Compared to existing CNN-Transformer hybrid models, TE-TransReID maintains comparable recognition accuracy while drastically reducing model parameters, establishing an optimal equilibrium between recognition accuracy and computational efficiency.

摘要

行人重识别旨在通过分析个人特征,在不重叠的摄像头之间匹配同一个人的图像。最近,基于Transformer的模型在这项任务中展现出了卓越的能力,并取得了突破性进展。然而,它们高昂的计算成本以及捕捉细粒度局部特征的能力不足,对重识别性能造成了重大限制。为应对这些挑战,本文提出了一种新颖的基于高效Transformer的行人重识别(TE-TransReID)框架。具体而言,所提出的框架仅保留预训练视觉Transformer(ViT)的前L层用于全局特征提取,同时结合从预训练卷积神经网络提取的局部特征,从而在高精度和轻量级网络之间实现权衡。此外,我们提出了一种双重高效特征融合策略,以整合全局和局部特征,实现准确的行人重识别。基于高效令牌的特征融合模块(ETFFM)采用基于门的网络来学习融合的逐令牌特征,而基于高效补丁的特征融合模块(EPFFM)利用轻量级Transformer来聚合补丁级特征。最后,TE-TransReID在Market1501、DukeMTMC和MSMT17上分别以2750万个参数实现了94.8%、88.3%和85.7%的Rank-1准确率。与现有的卷积神经网络-Transformer混合模型相比,TE-TransReID在大幅减少模型参数的同时保持了相当的识别准确率,在识别准确率和计算效率之间建立了最佳平衡。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c40a/12431027/f097e823a9d6/sensors-25-05461-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验