Li Jun, Zhang Jiajie, Shao Yanhua, Liu Feng
Artificial Intelligence Security Innovation Research, Beijing Information Science and Technology University, Beijing 100192, China.
Department of Information Security, Beijing Information Science and Technology University, Beijing 100192, China.
Sensors (Basel). 2024 Jun 17;24(12):3918. doi: 10.3390/s24123918.
To tackle the intricate challenges associated with the low detection accuracy of images taken by unmanned aerial vehicles (UAVs), arising from the diverse sizes and types of objects coupled with limited feature information, we present the SRE-YOLOv8 as an advanced method. Our method enhances the YOLOv8 object detection algorithm by leveraging the Swin Transformer and a lightweight residual feature pyramid network (RE-FPN) structure. Firstly, we introduce an optimized Swin Transformer module into the backbone network to preserve ample global contextual information during feature extraction and to extract a broader spectrum of features using self-attention mechanisms. Subsequently, we integrate a Residual Feature Augmentation (RFA) module and a lightweight attention mechanism named ECA, thereby transforming the original FPN structure to RE-FPN, intensifying the network's emphasis on critical features. Additionally, an SOD (small object detection) layer is incorporated to enhance the network's ability to recognize the spatial information of the model, thus augmenting accuracy in detecting small objects. Finally, we employ a Dynamic Head equipped with multiple attention mechanisms in the object detection head to enhance its performance in identifying low-resolution targets amidst complex backgrounds. Experimental evaluation conducted on the VisDrone2021 dataset reveals a significant advancement, showcasing an impressive 9.2% enhancement over the original YOLOv8 algorithm.
为应对无人驾驶飞行器(UAV)拍摄图像检测精度低所带来的复杂挑战,这些挑战源于物体的各种尺寸和类型以及有限的特征信息,我们提出了SRE-YOLOv8作为一种先进方法。我们的方法通过利用Swin Transformer和轻量级残差特征金字塔网络(RE-FPN)结构来增强YOLOv8目标检测算法。首先,我们在主干网络中引入优化的Swin Transformer模块,以在特征提取过程中保留充足的全局上下文信息,并使用自注意力机制提取更广泛的特征。随后,我们集成了残差特征增强(RFA)模块和名为ECA的轻量级注意力机制,从而将原始的FPN结构转换为RE-FPN,强化网络对关键特征的关注。此外,引入了一个SOD(小目标检测)层,以增强网络识别模型空间信息的能力,从而提高检测小目标的准确性。最后,我们在目标检测头中采用配备多个注意力机制的动态头部,以增强其在复杂背景中识别低分辨率目标的性能。在VisDrone2021数据集上进行的实验评估显示出显著进展,与原始YOLOv8算法相比,精度提高了令人印象深刻的9.2%。