Suppr超能文献

ReMamba:一种用于可见光-红外行人重识别的卷积神经网络与Mamba混合聚合网络

ReMamba: a hybrid CNN-Mamba aggregation network for visible-infrared person re-identification.

作者信息

Geng Haokun, Peng Jiaren, Yang Wenzhong, Chen Danny, Lv Hongzhen, Li Guanghan, Shao Yi

机构信息

School of Computer Science and Technology (School of Cyberspace Security), Xinjiang University, Urumqi, 830046, China.

Xinjiang Key Laboratory of Multilingual Information Technology, Xinjiang University, Urumqi, 830046, China.

出版信息

Sci Rep. 2024 Nov 26;14(1):29362. doi: 10.1038/s41598-024-80766-8.

Abstract

Visible-Infrared Person Re-identification (VI-ReID) has been consistently challenged by the significant intra-class variations and cross-modality differences between different cameras. Therefore, the key lies in how to extract discriminative modality-shared features. Existing VI-ReID methods based on Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have shortcomings in capturing global features and controlling computational complexity, respectively. To tackle these challenges, we propose a hybrid network framework called ReMamba. Specifically, we first use a CNN as the backbone network to extract multi-level features. Then, we introduce the Visual State Space (VSS) model, which is responsible for integrating the local features output by the CNN from lower to higher levels. These local features serve as a complement to global information and thereby enhancing the local details clarity of the global features. Considering the potential redundancy and semantic differences between local and global features, we design an adaptive feature aggregation module that automatically filters and effectively aggregates both types of features, incorporating an auxiliary aggregation loss to optimize the aggregation process. Furthermore, to better constrain cross-modality features and intra-modal features, we design a modal consistency identity constraint loss to alleviate cross-modality differences and extract modality-shared information. Extensive experiments conducted on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that our proposed ReMamba outperforms state-of-the-art VI-ReID methods.

摘要

可见-红外行人重识别(VI-ReID)一直受到不同摄像头之间显著的类内变化和跨模态差异的挑战。因此,关键在于如何提取有区分力的模态共享特征。现有的基于卷积神经网络(CNN)和视觉Transformer(ViT)的VI-ReID方法分别在捕捉全局特征和控制计算复杂度方面存在不足。为应对这些挑战,我们提出了一种名为ReMamba的混合网络框架。具体来说,我们首先使用CNN作为骨干网络来提取多级特征。然后,我们引入视觉状态空间(VSS)模型,该模型负责从低到高整合CNN输出的局部特征。这些局部特征作为全局信息的补充,从而提高全局特征的局部细节清晰度。考虑到局部特征和全局特征之间可能存在的冗余和语义差异,我们设计了一个自适应特征聚合模块,该模块自动过滤并有效聚合这两种类型的特征,并引入辅助聚合损失来优化聚合过程。此外,为了更好地约束跨模态特征和模态内特征,我们设计了一个模态一致性身份约束损失,以减轻跨模态差异并提取模态共享信息。在SYSU-MM01、RegDB和LLCM数据集上进行的大量实验表明,我们提出的ReMamba优于现有的VI-ReID方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b8/11599763/baed7dba0eed/41598_2024_80766_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验