多尺度并行门控局部特征变换器

Multi-scale parallel gated local feature transformer.

作者信息

Qu Hangzhou, Hu Zhuhua, Wu Jiaqi

机构信息

School of Information and Communication Engineering, Hainan University, Haikou, 570228, China.

出版信息

Sci Rep. 2025 Mar 5;15(1):7684. doi: 10.1038/s41598-025-91857-5.

DOI:10.1038/s41598-025-91857-5

PMID:40044875

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11882942/

Abstract

Visual Simultaneous Localization and Mapping (VSLAM) is a crucial technology for autonomous mobile vision robots. However, existing methods often suffer from low localization accuracy and poor robustness in scenarios with significant scale variations and low-texture environments, primarily due to insufficient feature extraction and reduced matching precision. To address these challenges, this paper proposes an improved multi-scale local feature matching algorithm based on LoFTR, named MSpGLoFTR. First, we introduce a Multi-Scale Local Attention Module (MSLAM), which achieves feature fusion and resolution alignment through multi-scale window partitioning and a shared multi-layer perceptron (MLP). Second, a Multi-Scale Parallel Attention Module is designed to capture features across various scales, enhancing the model's adaptability to large-scale features and highly similar pixel regions. Finally, a Gated Convolutional Network (GCN) mechanism is incorporated to dynamically adjust weights, emphasizing key features while suppressing background noise, thereby further improving matching precision and robustness. Experimental results demonstrate that MSpGLoFTR outperforms LoFTR in terms of matching precision, relative pose estimation performance, and adaptability to complex scenarios. Notably, it excels in environments with significant illumination changes, scale variations, and viewpoint shifts. This makes MSpGLoFTR an efficient and robust feature matching solution for complex vision tasks.

摘要

视觉同步定位与地图构建（VSLAM）是自主移动视觉机器人的一项关键技术。然而，现有方法在尺度变化显著和低纹理环境的场景中，往往存在定位精度低和鲁棒性差的问题，主要原因是特征提取不足和匹配精度降低。为应对这些挑战，本文提出了一种基于LoFTR的改进多尺度局部特征匹配算法，名为MSpGLoFTR。首先，我们引入了一个多尺度局部注意力模块（MSLAM），它通过多尺度窗口划分和共享多层感知器（MLP）实现特征融合和分辨率对齐。其次，设计了一个多尺度并行注意力模块来跨不同尺度捕捉特征，增强模型对大尺度特征和高度相似像素区域的适应性。最后，引入了门控卷积网络（GCN）机制来动态调整权重，强调关键特征同时抑制背景噪声，从而进一步提高匹配精度和鲁棒性。实验结果表明，MSpGLoFTR在匹配精度、相对位姿估计性能以及对复杂场景的适应性方面均优于LoFTR。值得注意的是，它在光照变化显著、尺度变化和视角转换的环境中表现出色。这使得MSpGLoFTR成为复杂视觉任务中一种高效且鲁棒的特征匹配解决方案。