TECNALIA, Basque Research and Technology Alliance (BRTA), Mikeletegi Pasealekua 7, 20009 Donostia-San Sebastián, Spain.
Robotics and Autonomous Systems Group, Universidad del País Vasco/Euskal Herriko Unibertsitatea, 48940 Basque, Spain.
Sensors (Basel). 2021 Feb 4;21(4):1078. doi: 10.3390/s21041078.
Deep learning methods have been successfully applied to image processing, mainly using 2D vision sensors. Recently, the rise of depth cameras and other similar 3D sensors has opened the field for new perception techniques. Nevertheless, 3D convolutional neural networks perform slightly worse than other 3D deep learning methods, and even worse than their 2D version. In this paper, we propose to improve 3D deep learning results by transferring the pretrained weights learned in 2D networks to their corresponding 3D version. Using an industrial object recognition context, we have analyzed different combinations of 3D convolutional networks (VGG16, ResNet, Inception ResNet, and EfficientNet), comparing the recognition accuracy. The highest accuracy is obtained with EfficientNetB0 using extrusion with an accuracy of 0.9217, which gives comparable results to state-of-the art methods. We also observed that the transfer approach enabled to improve the accuracy of the Inception ResNet 3D version up to 18% with respect to the score of the 3D approach alone.
深度学习方法已成功应用于图像处理,主要使用 2D 视觉传感器。最近,深度相机和其他类似的 3D 传感器的兴起为新的感知技术开辟了领域。然而,3D 卷积神经网络的性能略逊于其他 3D 深度学习方法,甚至逊于其 2D 版本。在本文中,我们提出通过将在 2D 网络中学习到的预训练权重转移到其相应的 3D 版本来改进 3D 深度学习的结果。我们使用工业物体识别上下文,分析了不同的 3D 卷积网络(VGG16、ResNet、Inception ResNet 和 EfficientNet)的组合,比较了识别精度。使用挤压的方式,EfficientNetB0 获得了最高的准确率,达到 0.9217,其结果可与最新方法相媲美。我们还观察到,迁移方法可以将 Inception ResNet 3D 版本的准确率提高 18%,而 3D 方法本身的准确率为 3D 方法的 18%。