Qian Yongheng, Tang Su-Kit
Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China.
Department of Mechatronics and Information Engineering, Zunyi Vocational and Technical College, Zunyi 563000, China.
Sensors (Basel). 2025 Jan 1;25(1):192. doi: 10.3390/s25010192.
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality retrieval task to match a person across different spectral camera views. Most existing works focus on learning shared feature representations from the final embedding space of advanced networks to alleviate modality differences between visible and infrared images. However, exclusively relying on high-level semantic information from the network's final layers can restrict shared feature representations and overlook the benefits of low-level details. Different from these methods, we propose a multi-scale contrastive learning network (MCLNet) with hierarchical knowledge synergy for VI-ReID. MCLNet is a novel two-stream contrastive deep supervision framework designed to train low-level details and high-level semantic representations simultaneously. MCLNet utilizes supervised contrastive learning (SCL) at each intermediate layer to strengthen visual representations and enhance cross-modality feature learning. Furthermore, a hierarchical knowledge synergy (HKS) strategy for pairwise knowledge matching promotes explicit information interaction across multi-scale features and improves information consistency. Extensive experiments on three benchmarks demonstrate the effectiveness of MCLNet.
可见-红外行人重识别(VI-ReID)是一项具有挑战性的跨模态检索任务,旨在跨不同光谱相机视图匹配行人。大多数现有工作专注于从先进网络的最终嵌入空间学习共享特征表示,以减轻可见光和红外图像之间的模态差异。然而,仅依赖网络最后几层的高级语义信息会限制共享特征表示,并忽略低级细节的益处。与这些方法不同,我们提出了一种用于VI-ReID的具有分层知识协同的多尺度对比学习网络(MCLNet)。MCLNet是一种新颖的双流对比深度监督框架,旨在同时训练低级细节和高级语义表示。MCLNet在每个中间层利用监督对比学习(SCL)来强化视觉表示并增强跨模态特征学习。此外,一种用于成对知识匹配的分层知识协同(HKS)策略促进了跨多尺度特征的显式信息交互,并提高了信息一致性。在三个基准上进行的大量实验证明了MCLNet的有效性。