基于高分辨率卫星图像的建筑物分割的残差 Inception U-Net（RIU-Net）方法与 U 形 CNN 和 Transformer 模型的比较

A Residual-Inception U-Net (RIU-Net) Approach and Comparisons with U-Shaped CNN and Transformer Models for Building Segmentation from High-Resolution Satellite Images.

机构信息

Department of Geomatics Engineering, Faculty of Civil Engineering, Istanbul Technical University, Istanbul 34469, Turkey.

出版信息

Sensors (Basel). 2022 Oct 8;22(19):7624. doi: 10.3390/s22197624.

DOI:10.3390/s22197624

PMID:36236721

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9570988/

Abstract

Building segmentation is crucial for applications extending from map production to urban planning. Nowadays, it is still a challenge due to CNNs' inability to model global context and Transformers' high memory need. In this study, 10 CNN and Transformer models were generated, and comparisons were realized. Alongside our proposed Residual-Inception U-Net (RIU-Net), U-Net, Residual U-Net, and Attention Residual U-Net, four CNN architectures (Inception, Inception-ResNet, Xception, and MobileNet) were implemented as encoders to U-Net-based models. Lastly, two Transformer-based approaches (Trans U-Net and Swin U-Net) were also used. Massachusetts Buildings Dataset and Inria Aerial Image Labeling Dataset were used for training and evaluation. On Inria dataset, RIU-Net achieved the highest IoU score, F1 score, and test accuracy, with 0.6736, 0.7868, and 92.23%, respectively. On Massachusetts Small dataset, Attention Residual U-Net achieved the highest IoU and F1 scores, with 0.6218 and 0.7606, and Trans U-Net reached the highest test accuracy, with 94.26%. On Massachusetts Large dataset, Residual U-Net accomplished the highest IoU and F1 scores, with 0.6165 and 0.7565, and Attention Residual U-Net attained the highest test accuracy, with 93.81%. The results showed that RIU-Net was significantly successful on Inria dataset. On Massachusetts datasets, Residual U-Net, Attention Residual U-Net, and Trans U-Net provided successful results.

摘要

建筑物分割对于从地图制作到城市规划等应用至关重要。由于 CNN 无法建模全局上下文，而 Transformer 又需要大量内存，因此这仍然是一个挑战。在本研究中，生成了 10 个 CNN 和 Transformer 模型，并进行了比较。除了我们提出的 Residual-Inception U-Net（RIU-Net）之外，还实现了 U-Net、Residual U-Net 和 Attention Residual U-Net 作为基于 U-Net 的模型的编码器，使用了四个 CNN 架构（Inception、Inception-ResNet、Xception 和 MobileNet）。最后，还使用了两种基于 Transformer 的方法（Trans U-Net 和 Swin U-Net）。使用了 Massachusetts Buildings Dataset 和 Inria Aerial Image Labeling Dataset 进行训练和评估。在 Inria 数据集上，RIU-Net 实现了最高的 IoU 得分、F1 得分和测试准确率，分别为 0.6736、0.7868 和 92.23%。在 Massachusetts Small 数据集上，Attention Residual U-Net 实现了最高的 IoU 和 F1 得分，分别为 0.6218 和 0.7606，而 Trans U-Net 实现了最高的测试准确率，为 94.26%。在 Massachusetts Large 数据集上，Residual U-Net 实现了最高的 IoU 和 F1 得分，分别为 0.6165 和 0.7565，而 Attention Residual U-Net 实现了最高的测试准确率，为 93.81%。结果表明，RIU-Net 在 Inria 数据集上表现出色。在 Massachusetts 数据集上，Residual U-Net、Attention Residual U-Net 和 Trans U-Net 取得了成功的结果。