IEEE Trans Vis Comput Graph. 2020 May;26(5):2012-2022. doi: 10.1109/TVCG.2020.2973477. Epub 2020 Feb 13.
Semantic understanding of 3D environments is critical for both the unmanned system and the human involved virtual/augmented reality (VR/AR) immersive experience. Spatially-sparse convolution, taking advantage of the intrinsic sparsity of 3D point cloud data, makes high resolution 3D convolutional neural networks tractable with state-of-the-art results on 3D semantic segmentation problems. However, the exhaustive computations limits the practical usage of semantic 3D perception for VR/AR applications in portable devices. In this paper, we identify that the efficiency bottleneck lies in the unorganized memory access of the sparse convolution steps, i.e., the points are stored independently based on a predefined dictionary, which is inefficient due to the limited memory bandwidth of parallel computing devices (GPU). With the insight that points are continuous as 2D surfaces in 3D space, a chunk-based sparse convolution scheme is proposed to reuse the neighboring points within each spatially organized chunk. An efficient multi-layer adaptive fusion module is further proposed for employing the spatial consistency cue of 3D data to further reduce the computational burden. Quantitative experiments on public datasets demonstrate that our approach works 11× faster than previous approaches with competitive accuracy. By implementing both semantic and geometric 3D reconstruction simultaneously on a portable tablet device, we demo a foundation platform for immersive AR applications.
三维环境的语义理解对于无人系统和涉及虚拟/增强现实 (VR/AR) 沉浸式体验的人类都至关重要。利用三维点云数据固有的稀疏性,稀疏卷积可以利用最新的三维语义分割问题的结果,实现高分辨率的三维卷积神经网络。然而,由于并行计算设备(GPU)的内存带宽有限,这种详尽的计算限制了语义三维感知在 VR/AR 应用中的实际使用。在本文中,我们确定效率瓶颈在于稀疏卷积步骤的非组织内存访问,即点根据预定义的字典独立存储,由于并行计算设备(GPU)的内存带宽有限,这种方法效率低下。基于点在三维空间中作为二维表面连续的观点,提出了一种基于块的稀疏卷积方案,以在每个空间组织的块内重复使用相邻的点。进一步提出了一种高效的多层自适应融合模块,利用三维数据的空间一致性线索进一步减少计算负担。在公共数据集上的定量实验表明,我们的方法比以前的方法快 11 倍,同时保持了有竞争力的准确性。通过在便携式平板电脑设备上同时实现语义和几何三维重建,我们展示了一个沉浸式 AR 应用的基础平台。