Cao Xingyu, Ding Pengxin, Li Jie, Chen Mei
School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China.
Sensors (Basel). 2025 Feb 20;25(5):1298. doi: 10.3390/s25051298.
Visible-infrared person re-identification (VI-ReID) aims to minimize the modality gaps of pedestrian images across different modalities. Existing methods primarily focus on extracting cross-modality features from the spatial domain, which often limits the comprehensive extraction of useful information. Compared with conventional approaches that either focus on single-frequency components or employ simple multi-branch fusion strategies, our method fundamentally addresses the modality discrepancy through systematic frequency-space co-learning. To address this limitation, we propose a novel bi-frequency feature fusion network (BiFFN) that effectively extracts and fuses features from both high- and low-frequency domains and spatial domain features to reduce modality gaps. The network introduces a frequency-spatial enhancement (FSE) module to enhance feature representation across both domains. Additionally, the deep frequency mining (DFM) module optimizes cross-modality information utilization by leveraging distinct features of high- and low-frequency features. The cross-frequency fusion (CFF) module further aligns low-frequency features and fuses them with high-frequency features to generate middle features that incorporate critical information from each modality. To refine the distribution of identity features in the common space, we develop a unified modality center (UMC) loss, which promotes a more balanced inter-modality distribution while preserving discriminative identity information. Extensive experiments demonstrate that the proposed BiFFN achieves state-of-the-art performance in VI-ReID. Specifically, our method achieved a Rank-1 accuracy of 77.5% and an mAP of 75.9% on the SYSU-MM01 dataset under the all-search mode. Additionally, it achieved a Rank-1 accuracy of 58.5% and an mAP of 63.7% on the LLCM dataset under the IR-VIS mode. These improvements verify that our model, with the integration of feature fusion and the incorporation of frequency domains, significantly reduces modality gaps and outperforms previous methods.
可见-红外行人重识别(VI-ReID)旨在最小化不同模态下行人图像的模态差距。现有方法主要集中于从空间域提取跨模态特征,这往往限制了有用信息的全面提取。与专注于单频分量或采用简单多分支融合策略的传统方法相比,我们的方法通过系统的频率-空间协同学习从根本上解决了模态差异问题。为解决这一局限性,我们提出了一种新颖的双频特征融合网络(BiFFN),它能有效提取和融合高频域、低频域以及空间域特征,以减少模态差距。该网络引入了一个频率-空间增强(FSE)模块来增强跨两个域的特征表示。此外,深度频率挖掘(DFM)模块通过利用高频和低频特征的不同特性来优化跨模态信息利用。跨频率融合(CFF)模块进一步对齐低频特征并将其与高频特征融合,以生成包含每个模态关键信息的中间特征。为了优化公共空间中身份特征的分布,我们开发了一种统一模态中心(UMC)损失,它在保留有区分性的身份信息的同时促进更平衡的模态间分布。大量实验表明,所提出的BiFFN在VI-ReID中取得了领先的性能。具体而言,在全搜索模式下,我们的方法在SYSU-MM01数据集上实现了77.5%的Rank-1准确率和75.9%的平均精度均值(mAP)。此外,在IR-VIS模式下,它在LLCM数据集上实现了58.5%的Rank-1准确率和63.7%的mAP。这些改进验证了我们的模型通过集成特征融合和纳入频率域,显著减少了模态差距并优于先前的方法。