Suppr超能文献

曼巴再识别:利用视觉曼巴进行多模态目标再识别。

MambaReID: Exploiting Vision Mamba for Multi-Modal Object Re-Identification.

作者信息

Zhang Ruijuan, Xu Lizhong, Yang Song, Wang Li

机构信息

School of Computer and Information, Hohai University, Nanjing 211106, China.

School of Mathematics and Statistics, Huaiyin Normal University, Huai'an 223300, China.

出版信息

Sensors (Basel). 2024 Jul 17;24(14):4639. doi: 10.3390/s24144639.

Abstract

Multi-modal object re-identification (ReID) is a challenging task that seeks to identify objects across different image modalities by leveraging their complementary information. Traditional CNN-based methods are constrained by limited receptive fields, whereas Transformer-based approaches are hindered by high computational demands and a lack of convolutional biases. To overcome these limitations, we propose a novel fusion framework named MambaReID, integrating the strengths of both architectures with the effective VMamba. Specifically, our MambaReID consists of three components: Three-Stage VMamba (TSV), Dense Mamba (DM), and Consistent VMamba Fusion (CVF). TSV efficiently captures global context information and local details with low computational complexity. DM enhances feature discriminability by fully integrating inter-modality information with shallow and deep features through dense connections. Additionally, with well-aligned multi-modal images, CVF provides more granular modal aggregation, thereby improving feature robustness. The MambaReID framework, with its innovative components, not only achieves superior performance in multi-modal object ReID tasks, but also does so with fewer parameters and lower computational costs. Our proposed MambaReID's effectiveness is validated by extensive experiments conducted on three multi-modal object ReID benchmarks.

摘要

多模态目标重识别(ReID)是一项具有挑战性的任务,旨在通过利用不同图像模态之间的互补信息来识别跨模态的目标。传统的基于卷积神经网络(CNN)的方法受到有限感受野的限制,而基于Transformer的方法则受到高计算需求和缺乏卷积偏差的阻碍。为了克服这些限制,我们提出了一种名为MambaReID的新型融合框架,通过有效的VMamba集成了两种架构的优势。具体来说,我们的MambaReID由三个组件组成:三阶段VMamba(TSV)、密集Mamba(DM)和一致VMamba融合(CVF)。TSV以低计算复杂度有效地捕获全局上下文信息和局部细节。DM通过密集连接将跨模态信息与浅层和深层特征充分整合,增强了特征的可辨别性。此外,通过对齐良好的多模态图像,CVF提供了更精细的模态聚合,从而提高了特征的鲁棒性。具有创新组件的MambaReID框架不仅在多模态目标ReID任务中取得了卓越的性能,而且还以更少的参数和更低的计算成本实现了这一点。我们提出的MambaReID的有效性通过在三个多模态目标ReID基准上进行的广泛实验得到了验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d8f/11280729/d8aac949ed88/sensors-24-04639-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验