Extreme Robotics Lab, University of Birmingham, Birmingham B15 2TT, UK.
Lincoln Centre for Autonomous Systems (L-CAS), University of Lincoln, Lincoln LN6 7TS, UK.
Sensors (Basel). 2018 Sep 14;18(9):3099. doi: 10.3390/s18093099.
In this paper, a novel Pixel-Voxel network is proposed for dense 3D semantic mapping, which can perform dense 3D mapping while simultaneously recognizing and labelling the semantic category each point in the 3D map. In our approach, we fully leverage the advantages of different modalities. That is, the PixelNet can learn the high-level contextual information from 2D RGB images, and the VoxelNet can learn 3D geometrical shapes from the 3D point cloud. Unlike the existing architecture that fuses score maps from different modalities with equal weights, we propose a softmax weighted fusion stack that adaptively learns the varying contributions of PixelNet and VoxelNet and fuses the score maps according to their respective confidence levels. Our approach achieved competitive results on both the SUN RGB-D and NYU V2 benchmarks, while the runtime of the proposed system is boosted to around 13 Hz, enabling near-real-time performance using an i7 eight-cores PC with a single Titan X GPU.
本文提出了一种新颖的像素-体素网络,用于密集的 3D 语义映射,它可以在进行密集的 3D 映射的同时识别和标记 3D 地图中每个点的语义类别。在我们的方法中,我们充分利用了不同模态的优势。也就是说,PixelNet 可以从 2D RGB 图像中学习高级上下文信息,而 VoxelNet 可以从 3D 点云中学习 3D 几何形状。与现有融合不同模态得分图的架构不同,我们提出了一个 softmax 加权融合堆叠,自适应地学习 PixelNet 和 VoxelNet 的不同贡献,并根据各自的置信度水平融合得分图。我们的方法在 SUN RGB-D 和 NYU V2 基准上取得了有竞争力的结果,同时,所提出的系统的运行时间提高到了约 13 Hz,使用具有单个 Titan X GPU 的 i7 八核 PC 可以实现近实时性能。