Zhang Zikang, Li Gongquan
School of Geosciences, Yangtze University, Wuhan 430100, China.
Sensors (Basel). 2025 Mar 13;25(6):1786. doi: 10.3390/s25061786.
In real-time semantic segmentation for drone imagery, current lightweight algorithms suffer from the lack of integration of global and local information in the image, leading to missed detections and misclassifications in the classification categories. This paper proposes a method for the real-time semantic segmentation of drones that integrates multi-scale global context information. The principle utilizes a UNet structure, with the encoder employing a Resnet18 network to extract features. The decoder incorporates a global-local attention module, where the global branch compresses and extracts global information in both vertical and horizontal directions, and the local branch extracts local information through convolution, thereby enhancing the fusion of global and local information in the image. In the segmentation head, a shallow-feature fusion module is used to multi-scale integrate the various features extracted by the encoder, thereby strengthening the spatial information in the shallow features. The model was tested on the UAvid and UDD6 datasets, achieving accuracies of 68% mIoU (mean Intersection over Union) and 67% mIoU on the two datasets, respectively, 10% and 21.2% higher than the baseline model UNet. The real-time performance of the model reached 72.4 frames/s, which is 54.4 frames/s higher than the baseline model UNet. The experimental results demonstrate that the proposed model balances accuracy and real-time performance well.
在无人机图像的实时语义分割中,当前的轻量级算法存在图像中全局和局部信息缺乏整合的问题,导致分类类别中出现漏检和误分类。本文提出了一种整合多尺度全局上下文信息的无人机实时语义分割方法。该方法利用UNet结构,编码器采用Resnet18网络提取特征。解码器包含一个全局-局部注意力模块,其中全局分支在垂直和水平方向上压缩并提取全局信息,局部分支通过卷积提取局部信息,从而增强图像中全局和局部信息的融合。在分割头中,使用浅特征融合模块对编码器提取的各种特征进行多尺度整合,从而强化浅特征中的空间信息。该模型在UAvid和UDD6数据集上进行了测试,在这两个数据集上分别实现了68%的平均交并比(mIoU)和67%的mIoU,比基线模型UNet分别高出10%和21.2%。该模型的实时性能达到72.4帧/秒,比基线模型UNet高出54.4帧/秒。实验结果表明,所提出的模型在准确性和实时性能之间取得了良好的平衡。