Yuan Qinglie
School of Civil and Architecture Engineering, Panzhihua University, Panzhihua, China.
Sci Rep. 2025 Feb 22;15(1):6499. doi: 10.1038/s41598-025-91206-6.
Building rooftop extraction has been applied in various fields, such as cartography, urban planning, automatic driving, and intelligent city construction. Automatic building detection and extraction algorithms using high spatial resolution aerial images can provide precise location and geometry information, significantly reducing time, costs, and labor. Recently, deep learning algorithms, especially convolution neural networks (CNNs) and Transformer, have robust local or global feature extraction ability, achieving advanced performance in intelligent interpretation compared with conventional methods. However, buildings often exhibit scale variation, spectral heterogeneity, and similarity with complex geometric shapes. Hence, the building rooftop extraction results exist fragmentation and lack spatial details using these methods. To address these issues, this study developed a multi-scale global perceptron network based on Transformer and CNN using novel encoder-decoders for enhancing contextual representation of buildings. Specifically, an improved multi-head-attention encoder is employed by constructing multi-scale tokens to enhance global semantic correlations. Meanwhile, the context refinement decoder is developed and synergistically uses high-level semantic representation and shallow features to restore spatial details. Overall, quantitative analysis and visual experiments confirmed that the proposed model is more efficient and superior to other state-of-the-art methods, with a 95.18% F1 score on the WHU dataset and a 93.29% F1 score on the Massub dataset.
建筑物屋顶提取已应用于多个领域,如制图、城市规划、自动驾驶和智慧城市建设。利用高空间分辨率航空影像的自动建筑物检测和提取算法可以提供精确的位置和几何信息,显著减少时间、成本和人力。近年来,深度学习算法,特别是卷积神经网络(CNN)和Transformer,具有强大的局部或全局特征提取能力,与传统方法相比,在智能解译方面取得了先进的性能。然而,建筑物常常表现出尺度变化、光谱异质性以及与复杂几何形状的相似性。因此,使用这些方法进行建筑物屋顶提取的结果存在碎片化现象且缺乏空间细节。为了解决这些问题,本研究基于Transformer和CNN开发了一种多尺度全局感知器网络,使用新颖的编码器-解码器来增强建筑物的上下文表示。具体而言,通过构建多尺度令牌采用改进的多头注意力编码器来增强全局语义相关性。同时,开发了上下文细化解码器,并协同使用高级语义表示和浅层特征来恢复空间细节。总体而言,定量分析和视觉实验证实,所提出的模型更高效且优于其他现有最先进方法,在WHU数据集上的F1分数为95.18%,在Massub数据集上的F1分数为93.29%。