The Graduate Center, Computer Science Department, City University of New York, New York, NY 10016, USA.
Hunter College & The Graduate Center, Computer Science Department, City University of New York, New York, NY 10065, USA.
Sensors (Basel). 2021 Feb 9;21(4):1213. doi: 10.3390/s21041213.
Instance segmentation and object detection are significant problems in the fields of computer vision and robotics. We address those problems by proposing a novel object segmentation and detection system. First, we detect 2D objects based on RGB, depth only, or RGB-D images. A 3D convolutional-based system, named Frustum VoxNet, is proposed. This system generates frustums from 2D detection results, proposes 3D candidate voxelized images for each frustum, and uses a 3D convolutional neural network (CNN) based on these candidates voxelized images to perform the 3D instance segmentation and object detection. Results on the SUN RGB-D dataset show that our RGB-D-based system's 3D inference is much faster than state-of-the-art methods, without a significant loss of accuracy. At the same time, we can provide segmentation and detection results using depth only images, with accuracy comparable to RGB-D-based systems. This is important since our methods can also work well in low lighting conditions, or with sensors that do not acquire RGB images. Finally, the use of segmentation as part of our pipeline increases detection accuracy, while providing at the same time 3D instance segmentation.
实例分割和目标检测是计算机视觉和机器人领域的重要问题。我们通过提出一种新的目标分割和检测系统来解决这些问题。首先,我们基于 RGB、仅深度或 RGB-D 图像检测 2D 目标。我们提出了一个基于 3D 卷积的系统,名为 Frustum VoxNet。该系统从 2D 检测结果生成视锥,为每个视锥提议 3D 候选体素化图像,并使用基于这些候选体素化图像的 3D 卷积神经网络 (CNN) 进行 3D 实例分割和目标检测。在 SUN RGB-D 数据集上的结果表明,我们的基于 RGB-D 的系统的 3D 推断速度比最先进的方法快得多,而精度损失不大。同时,我们可以仅使用深度图像提供分割和检测结果,其准确性可与基于 RGB-D 的系统相媲美。这很重要,因为我们的方法在光照条件较差或传感器不采集 RGB 图像的情况下也能很好地工作。最后,分割作为我们管道的一部分的使用提高了检测精度,同时提供了 3D 实例分割。