Zhou Dengji, Wang Guizhou, He Guojin, Long Tengfei, Yin Ranyu, Zhang Zhaoming, Chen Sibao, Luo Bin
Aerospace Information Research Institute, Chinese Academy of Science, Beijing 100094, China.
University of Chinese Academy of Sciences, Beijing 100049, China.
Sensors (Basel). 2020 Dec 17;20(24):7241. doi: 10.3390/s20247241.
Building extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simple, because it contains only two parts: one is the backbone of the network, which is used to learn the local features (short distance context information around the pixel) of buildings from the image; the other part is the pyramid self-attention module, which is used to obtain the global features (long distance context information with other pixels in the image) and the comprehensive features (includes color, texture, geometric and high-level semantic feature) of the building. The network is an end-to-end approach. In the training stage, the input is the remote sensing image and corresponding label, and the output is probability map (the probability that each pixel is or is not building). In the prediction stage, the input is the remote sensing image, and the output is the extraction result of the building. The complexity of the network structure was reduced so that it is easy to implement. The proposed PISANet was tested on two datasets. The result shows that the overall accuracy reached 94.50 and 96.15%, the intersection-over-union reached 77.45 and 87.97%, and F1 index reached 87.27 and 93.55%, respectively. In experiments on different datasets, PISANet obtained high overall accuracy, low error rate and improved integrity of individual buildings.
从高空间分辨率遥感影像中提取建筑物是遥感应用和计算机视觉领域的研究热点。本文提出了一种语义分割模型,这是一种名为金字塔自注意力网络(PISANet)的监督方法。其结构简单,仅包含两部分:一部分是网络主干,用于从图像中学习建筑物的局部特征(像素周围的短距离上下文信息);另一部分是金字塔自注意力模块,用于获取建筑物的全局特征(与图像中其他像素的长距离上下文信息)和综合特征(包括颜色、纹理、几何和高级语义特征)。该网络采用端到端的方法。在训练阶段,输入为遥感影像及其对应的标签,输出为概率图(每个像素是或不是建筑物的概率)。在预测阶段,输入为遥感影像,输出为建筑物的提取结果。降低了网络结构的复杂度,便于实现。所提出的PISANet在两个数据集上进行了测试。结果表明,总体准确率分别达到94.50%和96.15%,交并比分别达到77.45%和87.97%,F1指数分别达到87.27%和93.55%。在不同数据集的实验中,PISANet获得了较高的总体准确率、较低的错误率以及提高了单个建筑物的完整性。