Suppr
超能文献

基于可变形卷积的编解码图像分割。

Image Segmentation Using Encoder-Decoder with Deformable Convolutions.

机构信息

Computer Science Department, University Politehnica of Bucharest, RO-060042 Bucharest, Romania.

出版信息

Sensors (Basel). 2021 Feb 24;21(5):1570. doi: 10.3390/s21051570.

DOI:10.3390/s21051570

PMID:33668156

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7956600/

Abstract

Image segmentation is an essential step in image analysis that brings meaning to the pixels in the image. Nevertheless, it is also a difficult task due to the lack of a general suited approach to this problem and the use of real-life pictures that can suffer from noise or object obstruction. This paper proposes an architecture for semantic segmentation using a convolutional neural network based on the Xception model, which was previously used for classification. Different experiments were made in order to find the best performances of the model (eg. different resolution and depth of the network and data augmentation techniques were applied). Additionally, the network was improved by adding a deformable convolution module. The proposed architecture obtained a 76.8 mean IoU on the Pascal VOC 2012 dataset and 58.1 on the Cityscapes dataset. It outperforms SegNet and U-Net networks, both networks having considerably more parameters and also a higher inference time.

摘要

图像分割是图像分析中的一个重要步骤，它赋予了图像像素意义。然而，由于缺乏一种通用的方法来解决这个问题，以及使用可能受到噪声或物体遮挡的实际图片，因此这也是一项具有挑战性的任务。本文提出了一种使用基于 Xception 模型的卷积神经网络进行语义分割的架构，该模型之前曾用于分类。为了找到模型的最佳性能（例如，应用不同的网络分辨率和深度以及数据增强技术），进行了不同的实验。此外，通过添加可变形卷积模块来改进网络。所提出的架构在 Pascal VOC 2012 数据集上获得了 76.8 的平均 IoU，在 Cityscapes 数据集上获得了 58.1 的平均 IoU。它优于 SegNet 和 U-Net 网络，这两个网络具有更多的参数，推断时间也更长。