DTS-Net：用于快速准确语义对象分割的深度到空间网络

DTS-Net: Depth-to-Space Networks for Fast and Accurate Semantic Object Segmentation.

作者信息

Ibrahem Hatem, Salem Ahmed, Kang Hyun-Soo

机构信息

Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Korea.

Electrical Engineering Department, Faculty of Engineering, Assiut University, Assiut 71515, Egypt.

出版信息

Sensors (Basel). 2022 Jan 3;22(1):337. doi: 10.3390/s22010337.

DOI:10.3390/s22010337

PMID:35009879

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8749585/

Abstract

We propose Depth-to-Space Net (DTS-Net), an effective technique for semantic segmentation using the efficient sub-pixel convolutional neural network. This technique is inspired by depth-to-space (DTS) image reconstruction, which was originally used for image and video super-resolution tasks, combined with a mask enhancement filtration technique based on multi-label classification, namely, Nearest Label Filtration. In the proposed technique, we employ depth-wise separable convolution-based architectures. We propose both a deep network, that is, DTS-Net, and a lightweight network, DTS-Net-Lite, for real-time semantic segmentation; these networks employ Xception and MobileNetV2 architectures as the feature extractors, respectively. In addition, we explore the joint semantic segmentation and depth estimation task and demonstrate that the proposed technique can efficiently perform both tasks simultaneously, outperforming state-of-art (SOTA) methods. We train and evaluate the performance of the proposed method on the PASCAL VOC2012, NYUV2, and CITYSCAPES benchmarks. Hence, we obtain high mean intersection over union (mIOU) and mean pixel accuracy (Pix.acc.) values using simple and lightweight convolutional neural network architectures of the developed networks. Notably, the proposed method outperforms SOTA methods that depend on encoder-decoder architectures, although our implementation and computations are far simpler.

摘要

我们提出了深度到空间网络（DTS-Net），这是一种使用高效子像素卷积神经网络进行语义分割的有效技术。该技术受到深度到空间（DTS）图像重建的启发，DTS最初用于图像和视频超分辨率任务，并结合了基于多标签分类的掩码增强过滤技术，即最近标签过滤。在所提出的技术中，我们采用基于深度可分离卷积的架构。我们提出了一个深度网络，即DTS-Net，以及一个轻量级网络DTS-Net-Lite，用于实时语义分割；这些网络分别采用Xception和MobileNetV2架构作为特征提取器。此外，我们探索了联合语义分割和深度估计任务，并证明所提出的技术可以同时高效地执行这两个任务，优于当前最先进（SOTA）的方法。我们在PASCAL VOC2012、NYUV2和CITYSCAPES基准上训练和评估所提出方法的性能。因此，我们使用所开发网络的简单且轻量级的卷积神经网络架构获得了较高的平均交并比（mIOU）和平均像素准确率（Pix.acc.）值。值得注意的是，尽管我们的实现和计算要简单得多，但所提出的方法优于依赖编码器-解码器架构的SOTA方法。