Suppr超能文献

基于Nystromformer的跨模态变压器用于可见光-红外行人重识别

Nystromformer based cross-modality transformer for visible-infrared person re-identification.

作者信息

Mishra Ranjit Kumar, Mondal Arijit, Mathew Jimson

机构信息

Department of Computer Science and Engineering, Indian Institute of Technology Patna, Bihta, Patna, 801106, Bihar, India.

出版信息

Sci Rep. 2025 May 9;15(1):16224. doi: 10.1038/s41598-025-01226-5.

Abstract

Person re-identification (Re-ID) aims to accurately match individuals across different camera views, a critical task for surveillance and security applications, often under varying conditions such as illumination, pose, and background. Traditional Re-ID systems operate solely in the visible spectrum, which limits their effectiveness under varying lighting conditions and at night. To overcome these limitations, leveraging the visible-infrared (VIS-IR) domain becomes essential, as infrared imaging provides reliable information in low-light and night-time environments. However, the integration of VIS (visible) and IR (infrared) modalities introduces significant cross-modality discrepancies, posing a major challenge for feature alignment and fusion. To address this, we propose NiCTRAM: a Nyströmformer-based Cross-Modality Transformer designed for robust VIS-IR person re-identification. Our framework begins by extracting hierarchical features from both RGB and IR images through a shared convolutional neural network (CNN) backbone, ensuring the preservation of modality-specific characteristics. These features are then processed by parallel Nyströmformer encoders, which efficiently capture long-range dependencies in linear time using lightweight self-attention mechanisms. To bridge the modality gap, a cross-attention fusion block is introduced, where RGB and IR features interact and integrate second-order covariance statistics to model higher-order correlations. The fused features are subsequently refined through projection layers and optimized for re-identification using a classification head. Extensive experiments on benchmark VIS-IR person Re-ID datasets demonstrate that NiCTRAM outperforms existing methods, achieving state-of-the-art accuracy and robustness by effectively addressing the cross-modality challenges inherent in VIS-IR Re-ID. The proposed NiCTRAM model achieves significant improvements over the current SOTA in VIS-IR ReID. On the SYSU-MM01 dataset, it surpasses the SOTA by 4.21% in Rank-1 accuracy and 2.79% in mAP for all-search single-shot mode, with similar gains in multi-shot settings. Additionally, NiCTRAM outperforms existing methods on RegDB and LLCM, achieving up to 5.90% higher Rank-1 accuracy and 5.83% higher mAP in Thermal-to-Visible mode. We will make the code and the model available at https://github.com/Ranjitkm2007/NiCTRAM.

摘要

行人重识别(Re-ID)旨在跨不同摄像头视角准确匹配个体,这是监控和安全应用中的一项关键任务,通常是在光照、姿态和背景等不同条件下进行。传统的Re-ID系统仅在可见光谱范围内运行,这限制了它们在不同光照条件下和夜间的有效性。为了克服这些限制,利用可见光-红外(VIS-IR)领域变得至关重要,因为红外成像在低光照和夜间环境中提供可靠信息。然而,VIS(可见光)和IR(红外)模态的整合引入了显著的跨模态差异,给特征对齐和融合带来了重大挑战。为了解决这个问题,我们提出了NiCTRAM:一种基于Nyströmformer的跨模态Transformer,专为强大的VIS-IR行人重识别而设计。我们的框架首先通过共享卷积神经网络(CNN)主干从RGB和IR图像中提取分层特征,确保保留特定模态的特征。然后,这些特征由并行的Nyströmformer编码器处理,该编码器使用轻量级自注意力机制在线性时间内有效捕获长距离依赖关系。为了弥合模态差距,引入了一个交叉注意力融合块,其中RGB和IR特征相互作用并整合二阶协方差统计以对高阶相关性进行建模。融合后的特征随后通过投影层进行细化,并使用分类头针对重识别进行优化。在基准VIS-IR行人Re-ID数据集上进行的大量实验表明,NiCTRAM优于现有方法,通过有效解决VIS-IR Re-ID中固有的跨模态挑战,实现了当前最优的准确性和鲁棒性。所提出的NiCTRAM模型在VIS-IR ReID方面比当前最优方法有显著改进。在SYSU-MM01数据集上,在全搜索单镜头模式下,其在Rank-1准确率上超过最优方法4.21%,在mAP上超过2.79%,在多镜头设置下也有类似的提升。此外,NiCTRAM在RegDB和LLCM上优于现有方法,在热成像到可见光模式下,Rank-1准确率提高了5.90%,mAP提高了5.83%。我们将在https://github.com/Ranjitkm2007/NiCTRAM上提供代码和模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f26e/12064762/c5f3663d9af9/41598_2025_1226_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验