BiFFN：用于可见光-红外人体重识别的双频引导特征融合网络。

BiFFN: Bi-Frequency Guided Feature Fusion Network for Visible-Infrared Person Re-Identification.

作者信息

Cao Xingyu, Ding Pengxin, Li Jie, Chen Mei

机构信息

School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China.

出版信息

Sensors (Basel). 2025 Feb 20;25(5):1298. doi: 10.3390/s25051298.

DOI:10.3390/s25051298

PMID:40096047

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11902842/

Abstract

Visible-infrared person re-identification (VI-ReID) aims to minimize the modality gaps of pedestrian images across different modalities. Existing methods primarily focus on extracting cross-modality features from the spatial domain, which often limits the comprehensive extraction of useful information. Compared with conventional approaches that either focus on single-frequency components or employ simple multi-branch fusion strategies, our method fundamentally addresses the modality discrepancy through systematic frequency-space co-learning. To address this limitation, we propose a novel bi-frequency feature fusion network (BiFFN) that effectively extracts and fuses features from both high- and low-frequency domains and spatial domain features to reduce modality gaps. The network introduces a frequency-spatial enhancement (FSE) module to enhance feature representation across both domains. Additionally, the deep frequency mining (DFM) module optimizes cross-modality information utilization by leveraging distinct features of high- and low-frequency features. The cross-frequency fusion (CFF) module further aligns low-frequency features and fuses them with high-frequency features to generate middle features that incorporate critical information from each modality. To refine the distribution of identity features in the common space, we develop a unified modality center (UMC) loss, which promotes a more balanced inter-modality distribution while preserving discriminative identity information. Extensive experiments demonstrate that the proposed BiFFN achieves state-of-the-art performance in VI-ReID. Specifically, our method achieved a Rank-1 accuracy of 77.5% and an mAP of 75.9% on the SYSU-MM01 dataset under the all-search mode. Additionally, it achieved a Rank-1 accuracy of 58.5% and an mAP of 63.7% on the LLCM dataset under the IR-VIS mode. These improvements verify that our model, with the integration of feature fusion and the incorporation of frequency domains, significantly reduces modality gaps and outperforms previous methods.

摘要

可见-红外行人重识别（VI-ReID）旨在最小化不同模态下行人图像的模态差距。现有方法主要集中于从空间域提取跨模态特征，这往往限制了有用信息的全面提取。与专注于单频分量或采用简单多分支融合策略的传统方法相比，我们的方法通过系统的频率-空间协同学习从根本上解决了模态差异问题。为解决这一局限性，我们提出了一种新颖的双频特征融合网络（BiFFN），它能有效提取和融合高频域、低频域以及空间域特征，以减少模态差距。该网络引入了一个频率-空间增强（FSE）模块来增强跨两个域的特征表示。此外，深度频率挖掘（DFM）模块通过利用高频和低频特征的不同特性来优化跨模态信息利用。跨频率融合（CFF）模块进一步对齐低频特征并将其与高频特征融合，以生成包含每个模态关键信息的中间特征。为了优化公共空间中身份特征的分布，我们开发了一种统一模态中心（UMC）损失，它在保留有区分性的身份信息的同时促进更平衡的模态间分布。大量实验表明，所提出的BiFFN在VI-ReID中取得了领先的性能。具体而言，在全搜索模式下，我们的方法在SYSU-MM01数据集上实现了77.5%的Rank-1准确率和75.9%的平均精度均值（mAP）。此外，在IR-VIS模式下，它在LLCM数据集上实现了58.5%的Rank-1准确率和63.7%的mAP。这些改进验证了我们的模型通过集成特征融合和纳入频率域，显著减少了模态差距并优于先前的方法。