Xiao Yunze, Di Nan
Academy for Advanced Interdisciplinary Studies, School of Mathematics and Statistic, Northeast Normal University, Changchun, 130024, China.
Sci Rep. 2024 Oct 27;14(1):25624. doi: 10.1038/s41598-024-77513-4.
Currently, lightweight small object detection algorithms for unmanned aerial vehicles (UAVs) often employ group convolutions, resulting in high Memory Access Cost (MAC) and rendering them unsuitable for edge devices that rely on parallel computing. To address this issue, we propose the SOD-YOLO model based on YOLOv7, which incorporates a DSDM-LFIM backbone network and includes a small object detection branch. The DSDM-LFIM backbone network, which combines Deep-Shallow Downsampling Modules (DSD Modules) and Lightweight Feature Integration Modules (LFI Modules), avoids excessive use of group convolutions and element-wise operations. The DSD Module focuses on extracting both deep and shallow features from feature maps using fewer parameters to obtain richer feature representations. The LFI Module, is a dual-branch feature integration module designed to consolidate feature information. Experimental results demonstrate that the SOD-YOLO model achieves an AP50 of 50.7% and a FPS of 72.5 on the VisDrone validation set. Compared to YOLOv7, our model reduces computational costs by 20.25% and decreases the number of parameters by 17.89%. After scaling the number of channels in the model, it achieves an AP50 of 33.4% with an inference time of 27.3ms on the Atlas 200I DK A2. These experimental results indicate that the SOD-YOLO model can effectively perform small object detection tasks in a large number of aerial images captured by UAVs.
目前,用于无人机(UAV)的轻量级小目标检测算法通常采用分组卷积,导致内存访问成本(MAC)较高,使其不适用于依赖并行计算的边缘设备。为了解决这个问题,我们提出了基于YOLOv7的SOD-YOLO模型,该模型包含一个DSDM-LFIM骨干网络,并包括一个小目标检测分支。DSDM-LFIM骨干网络结合了深-浅下采样模块(DSD模块)和轻量级特征集成模块(LFI模块),避免了过度使用分组卷积和逐元素操作。DSD模块专注于使用较少参数从特征图中提取深层和浅层特征,以获得更丰富的特征表示。LFI模块是一个双分支特征集成模块,旨在整合特征信息。实验结果表明,SOD-YOLO模型在VisDrone验证集上的AP50为50.7%,FPS为72.5。与YOLOv7相比,我们的模型将计算成本降低了20.25%,参数数量减少了17.89%。在对模型中的通道数进行缩放后,它在Atlas 200I DK A2上的AP50为33.4%,推理时间为27.3ms。这些实验结果表明,SOD-YOLO模型能够有效地在无人机拍摄的大量航空图像中执行小目标检测任务。