Hu Yuekun, Liu Yingfan, Hui Bin
Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China.
Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
Sensors (Basel). 2024 Dec 25;25(1):44. doi: 10.3390/s25010044.
Cross-view geo-localization (CVGL) aims to determine the capture location of street-view images by matching them with corresponding 2D maps, such as satellite imagery. While recent bird's eye view (BEV)-based methods have advanced this task by addressing viewpoint and appearance differences, the existing approaches typically rely solely on either OpenStreetMap (OSM) data or satellite imagery, limiting localization robustness due to single-modality constraints. This paper presents a novel CVGL method that fuses OSM data with satellite imagery, leveraging their complementary strengths to enhance localization robustness. We integrate the semantic richness and structural information from OSM with the high-resolution visual details of satellite imagery, creating a unified 2D geospatial representation. Additionally, we employ a transformer-based BEV perception module that utilizes attention mechanisms to construct fine-grained BEV features from street-view images for matching with fused map features. Compared to state-of-the-art methods that utilize only OSM data, our approach achieves substantial improvements, with 12.05% and 12.06% recall enhancements on the KITTI benchmark for lateral and longitudinal localization within a 1-m error, respectively.
跨视角地理定位(CVGL)旨在通过将街景图像与相应的二维地图(如卫星图像)进行匹配来确定其拍摄位置。虽然最近基于鸟瞰图(BEV)的方法通过解决视角和外观差异推进了这项任务,但现有方法通常仅依赖于开放街道地图(OSM)数据或卫星图像,由于单模态约束而限制了定位的鲁棒性。本文提出了一种新颖的CVGL方法,该方法将OSM数据与卫星图像融合,利用它们的互补优势来增强定位鲁棒性。我们将来自OSM的语义丰富性和结构信息与卫星图像的高分辨率视觉细节相结合,创建一个统一的二维地理空间表示。此外,我们采用基于Transformer的BEV感知模块,该模块利用注意力机制从街景图像构建细粒度的BEV特征,以便与融合后的地图特征进行匹配。与仅使用OSM数据的现有方法相比,我们的方法取得了显著改进,在KITTI基准测试中,横向和纵向定位在1米误差范围内的召回率分别提高了12.05%和12.06%。