Cai Jun-Xiong, Feng Wensen, Chen Hao-Xiang, Mu Tai-Jiang
IEEE Trans Image Process. 2023;32:6401-6412. doi: 10.1109/TIP.2023.3332212. Epub 2023 Nov 28.
This paper presents a Semantic Positioning System (SPS) to enhance the accuracy of mobile device geo-localization in outdoor urban environments. Although the traditional Global Positioning System (GPS) can offer a rough localization, it lacks the necessary accuracy for applications such as Augmented Reality (AR). Our SPS integrates Geographic Information System (GIS) data, GPS signals, and visual image information to estimate the 6 Degree-of-Freedom (DoF) pose through cross-view semantic matching. This approach has excellent scalability to support GIS context with Levels of Detail (LOD). The map data representation is Digital Elevation Model (DEM), a cost-effective aerial map that allows for fast deployment for large-scale areas. However, the DEM lacks geometric and texture details, making it challenging for traditional visual feature extraction to establish pixel/voxel level cross-view correspondences. To address this, we sample observation pixels from the query ground-view image using predicted semantic labels. We then propose an iterative homography estimation method with semantic correspondences. To improve the efficiency of the overall system, we further employ a heuristic search to speedup the matching process. The proposed method is robust, real-time, and automatic. Quantitative experiments on the challenging Bund dataset show that we achieve a positioning accuracy of 73.24%, surpassing the baseline skyline-based method by 20%. Compared with the state-of-the-art semantic-based approach on the Kitti dataset, we improve the positioning accuracy by an average of 5%.
本文提出了一种语义定位系统(SPS),以提高移动设备在室外城市环境中的地理定位精度。尽管传统的全球定位系统(GPS)可以提供大致的定位,但它缺乏诸如增强现实(AR)等应用所需的精度。我们的SPS集成了地理信息系统(GIS)数据、GPS信号和视觉图像信息,通过跨视图语义匹配来估计六自由度(DoF)姿态。这种方法具有出色的可扩展性,能够支持不同细节层次(LOD)的GIS上下文。地图数据表示为数字高程模型(DEM),这是一种经济高效的航拍地图,允许在大面积区域快速部署。然而,DEM缺乏几何和纹理细节,这使得传统的视觉特征提取难以建立像素/体素级别的跨视图对应关系。为了解决这个问题,我们使用预测的语义标签从查询地面视图图像中采样观察像素。然后,我们提出了一种具有语义对应关系的迭代单应性估计方法。为了提高整个系统的效率,我们进一步采用启发式搜索来加速匹配过程。所提出的方法具有鲁棒性、实时性和自动性。在具有挑战性的外滩数据集上进行的定量实验表明,我们实现了73.24%的定位精度,比基于天际线的基线方法高出20%。与Kitti数据集上的最新基于语义的方法相比,我们将定位精度平均提高了5%。