Suppr超能文献

DTS-Net:用于快速准确语义对象分割的深度到空间网络

DTS-Net: Depth-to-Space Networks for Fast and Accurate Semantic Object Segmentation.

作者信息

Ibrahem Hatem, Salem Ahmed, Kang Hyun-Soo

机构信息

Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Korea.

Electrical Engineering Department, Faculty of Engineering, Assiut University, Assiut 71515, Egypt.

出版信息

Sensors (Basel). 2022 Jan 3;22(1):337. doi: 10.3390/s22010337.

Abstract

We propose Depth-to-Space Net (DTS-Net), an effective technique for semantic segmentation using the efficient sub-pixel convolutional neural network. This technique is inspired by depth-to-space (DTS) image reconstruction, which was originally used for image and video super-resolution tasks, combined with a mask enhancement filtration technique based on multi-label classification, namely, Nearest Label Filtration. In the proposed technique, we employ depth-wise separable convolution-based architectures. We propose both a deep network, that is, DTS-Net, and a lightweight network, DTS-Net-Lite, for real-time semantic segmentation; these networks employ Xception and MobileNetV2 architectures as the feature extractors, respectively. In addition, we explore the joint semantic segmentation and depth estimation task and demonstrate that the proposed technique can efficiently perform both tasks simultaneously, outperforming state-of-art (SOTA) methods. We train and evaluate the performance of the proposed method on the PASCAL VOC2012, NYUV2, and CITYSCAPES benchmarks. Hence, we obtain high mean intersection over union (mIOU) and mean pixel accuracy (Pix.acc.) values using simple and lightweight convolutional neural network architectures of the developed networks. Notably, the proposed method outperforms SOTA methods that depend on encoder-decoder architectures, although our implementation and computations are far simpler.

摘要

我们提出了深度到空间网络(DTS-Net),这是一种使用高效子像素卷积神经网络进行语义分割的有效技术。该技术受到深度到空间(DTS)图像重建的启发,DTS最初用于图像和视频超分辨率任务,并结合了基于多标签分类的掩码增强过滤技术,即最近标签过滤。在所提出的技术中,我们采用基于深度可分离卷积的架构。我们提出了一个深度网络,即DTS-Net,以及一个轻量级网络DTS-Net-Lite,用于实时语义分割;这些网络分别采用Xception和MobileNetV2架构作为特征提取器。此外,我们探索了联合语义分割和深度估计任务,并证明所提出的技术可以同时高效地执行这两个任务,优于当前最先进(SOTA)的方法。我们在PASCAL VOC2012、NYUV2和CITYSCAPES基准上训练和评估所提出方法的性能。因此,我们使用所开发网络的简单且轻量级的卷积神经网络架构获得了较高的平均交并比(mIOU)和平均像素准确率(Pix.acc.)值。值得注意的是,尽管我们的实现和计算要简单得多,但所提出的方法优于依赖编码器-解码器架构的SOTA方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f27/8749585/65e76b5cfb5d/sensors-22-00337-g001.jpg

相似文献

1
DTS-Net: Depth-to-Space Networks for Fast and Accurate Semantic Object Segmentation.
Sensors (Basel). 2022 Jan 3;22(1):337. doi: 10.3390/s22010337.
2
DTS-Depth: Real-Time Single-Image Depth Estimation Using Depth-to-Space Image Construction.
Sensors (Basel). 2022 Mar 1;22(5):1914. doi: 10.3390/s22051914.
3
Image Segmentation Using Encoder-Decoder with Deformable Convolutions.
Sensors (Basel). 2021 Feb 24;21(5):1570. doi: 10.3390/s21051570.
4
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.
5
A Novel Upsampling and Context Convolution for Image Semantic Segmentation.
Sensors (Basel). 2021 Mar 20;21(6):2170. doi: 10.3390/s21062170.
7
Detection, segmentation, and 3D pose estimation of surgical tools using convolutional neural networks and algebraic geometry.
Med Image Anal. 2021 May;70:101994. doi: 10.1016/j.media.2021.101994. Epub 2021 Feb 7.
8
Efficient attention-based deep encoder and decoder for automatic crack segmentation.
Struct Health Monit. 2022 Sep;21(5):2190-2205. doi: 10.1177/14759217211053776. Epub 2021 Dec 19.
9
Breast ultrasound image segmentation: A coarse-to-fine fusion convolutional neural network.
Med Phys. 2021 Aug;48(8):4262-4278. doi: 10.1002/mp.15006. Epub 2021 Jul 29.
10
Efficient and accurate semi-supervised semantic segmentation for industrial surface defects.
Sci Rep. 2024 Sep 19;14(1):21874. doi: 10.1038/s41598-024-72579-6.

引用本文的文献

1
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.
2
LEOD-Net: Learning Line-Encoded Bounding Boxes for Real-Time Object Detection.
Sensors (Basel). 2022 May 12;22(10):3699. doi: 10.3390/s22103699.

本文引用的文献

1
DiCENet: Dimension-Wise Convolutions for Efficient Networks.
IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2416-2425. doi: 10.1109/TPAMI.2020.3041871. Epub 2022 Apr 1.
2
Res2Net: A New Multi-Scale Backbone Architecture.
IEEE Trans Pattern Anal Mach Intell. 2021 Feb;43(2):652-662. doi: 10.1109/TPAMI.2019.2938758. Epub 2021 Jan 8.
4
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.
IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.
5
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.
IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615. Epub 2017 Jan 2.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验