Li Kunmo, Ou Yongsheng, Ning Jian, Kong Fanchang, Cai Haiyang, Li Haoyang
School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.
School of Computer Science, Wuhan University, Wuhan 430072, China.
Sensors (Basel). 2025 Jun 29;25(13):4056. doi: 10.3390/s25134056.
Visual Place Recognition (VPR) constitutes a pivotal task in the domains of computer vision and robotics. Prevailing VPR methods predominantly employ RGB-based features for query image retrieval and correspondence establishment. Nevertheless, such unimodal visual representations exhibit inherent susceptibility to environmental variations, inevitably degrading method precision. To address this problem, we propose a robust VPR framework integrating RGB and depth modalities. The architecture employs a coarse-to-fine paradigm, where global retrieval of top-N candidate images is performed using fused multimodal features, followed by a geometric verification of these candidates leveraging depth information. A Discrete Wavelet Transform Fusion (DWTF) module is proposed to generate robust multimodal global descriptors by effectively combining RGB and depth data using discrete wavelet transform. Furthermore, we introduce a Spiking Neuron Graph Matching (SNGM) module, which extracts geometric structure and spatial distance from depth data and employs graph matching for accurate depth feature correspondence. Extensive experiments on several VPR benchmarks demonstrate that our method achieves state-of-the-art performance while maintaining the best accuracy-efficiency trade-off.
视觉场所识别(VPR)是计算机视觉和机器人领域的一项关键任务。现有的VPR方法主要采用基于RGB的特征进行查询图像检索和对应关系建立。然而,这种单峰视觉表示对环境变化具有固有敏感性,不可避免地会降低方法的精度。为了解决这个问题,我们提出了一个融合RGB和深度模态的鲁棒VPR框架。该架构采用从粗到细的范式,其中使用融合的多模态特征对前N个候选图像进行全局检索,然后利用深度信息对这些候选图像进行几何验证。提出了一种离散小波变换融合(DWTF)模块,通过离散小波变换有效地组合RGB和深度数据,生成鲁棒的多模态全局描述符。此外,我们引入了一个脉冲神经元图匹配(SNGM)模块,该模块从深度数据中提取几何结构和空间距离,并采用图匹配来实现精确的深度特征对应。在多个VPR基准上进行的大量实验表明,我们的方法在保持最佳精度-效率权衡的同时,实现了领先的性能。