Guo Yuwei, Zhang Wenhao, Jiao Licheng, Wang Shuang, Wang Shuo, Liu Fang
Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education of China, School of Artificial Intelligence, International Research Center of Intelligent Perception and Computation, Xidian University, Xi'an, 710071, China.
School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK.
Sci Rep. 2025 May 25;15(1):18225. doi: 10.1038/s41598-025-01979-z.
Visible-infrared person re-identification (VI-ReID) aims to search the same pedestrian of interest across visible and infrared modalities. Existing models mainly focus on compensating for modality-specific information to reduce modality variation. However, these methods often introduce interfering information and lead to higher computational overhead when generating the corresponding images or features. Additionally, the pedestrian region characteristics in VI-ReID are not effectively utilized, thus resulting in ambiguous or unnatural images. To address these issues, it is critical to leverage pedestrian attentive features and learn modality-complete and -consistent representation. In this paper, a novel Region-based Augmentation and Cross Modality Attention (RACA) model is proposed, focusing on the pedestrian regions to efficiently compensate for missing modality-specific features. Specifically, we propose a region-based data augmentation module PedMix to enhance pedestrian region coherence by mixing the corresponding regions from different modalities, thus generating more natural images. Moreover, a lightweight hybrid compensation module, i.e., a Modality Feature Transfer (MFT) module, is proposed to integrate cross attention and convolution networks to avoid introducing interfering information while preserving minimal computational overhead. Extensive experiments conducted on the benchmark SYSU-MM01 and RegDB datasets demonstrated the effectiveness of our proposed RACA model.
可见-红外行人重识别(VI-ReID)旨在跨可见光和红外模态搜索同一感兴趣的行人。现有模型主要专注于补偿特定模态信息以减少模态差异。然而,这些方法在生成相应图像或特征时常常引入干扰信息并导致更高的计算开销。此外,VI-ReID中的行人区域特征未得到有效利用,从而导致图像模糊或不自然。为了解决这些问题,利用行人注意力特征并学习模态完整且一致的表示至关重要。本文提出了一种新颖的基于区域的增强与跨模态注意力(RACA)模型,聚焦于行人区域以有效补偿缺失的特定模态特征。具体而言,我们提出了一个基于区域的数据增强模块PedMix,通过混合来自不同模态的相应区域来增强行人区域的连贯性,从而生成更自然的图像。此外,还提出了一个轻量级混合补偿模块,即模态特征转移(MFT)模块,以整合交叉注意力和卷积网络,在保持最小计算开销的同时避免引入干扰信息。在基准SYSU-MM01和RegDB数据集上进行的大量实验证明了我们提出的RACA模型的有效性。