Suppr超能文献

使用Py-Net投票分割方法增强户外视觉定位

Enhanced outdoor visual localization using Py-Net voting segmentation approach.

作者信息

Wang Jing, Guo Cheng, Hu Shaoyi, Wang Yibo, Fan Xuhui

机构信息

College of Communication and Information Engineering, Xi'an University of Science and Technology, Xi' an, China.

出版信息

Front Robot AI. 2024 Oct 9;11:1469588. doi: 10.3389/frobt.2024.1469588. eCollection 2024.

Abstract

Camera relocalization determines the position and orientation of a camera in a 3D space. Althouh methods based on scene coordinate regression yield highly accurate results in indoor scenes, they exhibit poor performance in outdoor scenarios due to their large scale and increased complexity. A visual localization method, Py-Net, is therefore proposed herein. Py-Net is based on voting segmentation and comprises a main encoder containing Py-layer and two branch decoders. The Py-layer comprises pyramid convolution and 1 × 1 convolution kernels for feature extraction across multiple levels, with fewer parameters to enhance the model's ability to extract scene information. Coordinate attention was added at the end of the encoder for feature correction, which improved the model robustness to interference. To prevent the feature loss caused by repetitive structures and low-texture images in the scene, deep over-parameterized convolution modules were incorporated into the seg and vote decoders. Landmark segmentation and voting maps were used to establish the relation between images and landmarks in 3D space, reducing anomalies and achieving high precision with a small number of landmarks. The experimental results show that, in multiple outdoor scenes, Py-Net achieves lower distance and angle errors compared to existing methods. Additionally, compared to VS-Net, which also uses a voting segmentation structure, Py-Net reduces the number of parameters by 31.85% and decreases the model size from 236MB to 170 MB.

摘要

相机重定位确定相机在三维空间中的位置和方向。尽管基于场景坐标回归的方法在室内场景中能产生高精度的结果,但由于室外场景规模大且复杂度增加,它们在室外场景中的表现较差。因此,本文提出了一种视觉定位方法Py-Net。Py-Net基于投票分割,由一个包含Py层的主编码器和两个分支解码器组成。Py层包括金字塔卷积和1×1卷积核,用于跨多个层次提取特征,参数较少,可增强模型提取场景信息的能力。在编码器末尾添加了坐标注意力进行特征校正,提高了模型对干扰的鲁棒性。为防止场景中重复结构和低纹理图像导致的特征丢失,在分割和解码器中引入了深度过参数化卷积模块。地标分割和投票图用于建立图像与三维空间中地标的关系,减少异常情况,并在少量地标情况下实现高精度。实验结果表明,在多个室外场景中,Py-Net与现有方法相比,实现了更低的距离和角度误差。此外,与同样使用投票分割结构的VS-Net相比,Py-Net的参数数量减少了31.85%,模型大小从236MB减小到170MB。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/659d/11497456/277a68072f68/frobt-11-1469588-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验