Pan Xuran, Xu Kexing, Yang Shuhao, Liu Yukun, Zhang Rui, He Ping
College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, China.
School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.
Sensors (Basel). 2025 Mar 27;25(7):2112. doi: 10.3390/s25072112.
Building extraction plays a pivotal role in enabling rapid and accurate construction of urban maps, thereby supporting urban planning, smart city development, and urban management. Buildings in remote sensing imagery exhibit diverse morphological attributes and spectral signatures, yet their reliable interpretation through single-modal data remains constrained by heterogeneous terrain conditions, occlusions, and spatially variable illumination effects inherent to complex geographical landscapes. The integration of multi-modal data for building extraction offers significant advantages by leveraging complementary features from diverse data sources. However, the heterogeneity of multi-modal data complicates effective feature extraction, while the multi-scale cross-modal feature fusion encounters a semantic gap issue. To address these challenges, a novel building extraction network based on multi-modal remote sensing data called SDA-les (AGAFMs) was designed in the decoding stage to fuse multi-modal features at various scales, which dynamically adjust the importance of features from a global perspective to better balance the semantic information. The superior performance of the proposed method is demonstrated through comprehensive evaluations on the ISPRS Potsdam dataset with 97.66% F1 score and 95.42% IoU, the ISPRS Vaihingen dataset with 96.56% F1 score and 93.35% IoU, and the DFC23 Track2 dataset with 91.35% F1 score and 84.08% IoU.
建筑物提取在快速准确构建城市地图中起着关键作用,从而支持城市规划、智慧城市发展和城市管理。遥感影像中的建筑物呈现出多样的形态属性和光谱特征,然而,通过单模态数据对其进行可靠解读仍受复杂地理景观固有的异质地形条件、遮挡和空间可变光照效应的限制。通过利用来自不同数据源的互补特征,整合多模态数据进行建筑物提取具有显著优势。然而,多模态数据的异质性使有效的特征提取变得复杂,而多尺度跨模态特征融合则面临语义鸿沟问题。为应对这些挑战,在解码阶段设计了一种基于多模态遥感数据的新型建筑物提取网络,称为SDA-les(AGAFMs),以融合不同尺度的多模态特征,该网络从全局角度动态调整特征的重要性,以更好地平衡语义信息。通过对ISPRS波茨坦数据集(F1分数为97.66%,交并比为95.42%)、ISPRS维亨根数据集(F1分数为96.56%,交并比为93.35%)和DFC23 Track2数据集(F1分数为91.35%,交并比为84.08%)进行全面评估,证明了所提方法的卓越性能。