Department of Computer Science & Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India.
Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11543, Saudi Arabia.
Sensors (Basel). 2021 Nov 28;21(23):7950. doi: 10.3390/s21237950.
Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel's local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.
室内环境分类是一个具有挑战性的问题。低成本深度传感器的出现开辟了一个新的研究领域,即在场景理解中除了使用彩色图像(RGB)数据外,还可以使用深度信息。深度卷积网络的迁移学习需要处理整合这两种模态。单通道深度图像通常通过提取水平视差、离地高度和像素局部表面法线的角度(HHA)来转换为三通道图像,以便使用在 Places365 数据集上训练的网络进行迁移学习。HHA 编码的高计算成本对于场景的实时预测可能是一个主要缺点,尽管在训练阶段可能不太重要。我们提出了一种新的、计算效率高的编码方法,可以与任何卷积神经网络集成。我们表明,在用于场景分类的多模态迁移学习设置中,我们的编码方法表现同样出色甚至更好。我们的编码在定制的和预训练的 VGG16 Net 中实现。我们使用基于合成少数过采样技术(SMOTE)的特征级方法解决了图像数据集中的类别不平衡问题。通过适当的图像增强和微调,我们的网络实现了与其他最先进架构相当的场景分类精度。