Suppr
超能文献

基于可变形嵌入-Transformer 特征提取的 YOLOv4 用于航空影像中精确目标检测

YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery.

机构信息

College of Computer and Information Engineering, Central South University of Forestry and Technology University, Changsha 410004, China.

出版信息

Sensors (Basel). 2023 Feb 24;23(5):2522. doi: 10.3390/s23052522.

DOI:10.3390/s23052522

PMID:36904727

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10007093/

Abstract

The deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when applied to aerial images. To address these problems, we proposed the DET-YOLO enhancement based on YOLOv4. Initially, we employed a vision transformer to acquire highly effective global information extraction capabilities. In the transformer, we proposed deformable embedding instead of linear embedding and a full convolution feedforward network (FCFN) instead of a feedforward network in order to reduce the feature loss caused by cutting in the embedding process and improve the spatial feature extraction capability. Second, for improved multiscale feature fusion in the neck, we employed a depth direction separable deformable pyramid module (DSDP) rather than a feature pyramid network. Experiments on the DOTA, RSOD, and UCAS-AOD datasets demonstrated that our method's average accuracy (mAP) values reached 0.728, 0.952, and 0.945, respectively, which were comparable to the existing state-of-the-art methods.

摘要

深度学习方法在自然图像目标检测任务中取得了巨大的进展。然而，由于多尺度目标、复杂背景和高尺度小目标，来自自然图像领域的方法在应用于航空图像时经常无法产生令人满意的结果。为了解决这些问题，我们提出了基于 YOLOv4 的 DET-YOLO 增强。首先，我们采用视觉转换器来获取高效的全局信息提取能力。在转换器中，我们提出了可变形嵌入而不是线性嵌入，以及全卷积前馈网络（FCFN）而不是前馈网络，以减少嵌入过程中切割引起的特征损失，并提高空间特征提取能力。其次，为了提高颈部的多尺度特征融合能力，我们采用了深度方向可分离变形金字塔模块（DSDP）而不是特征金字塔网络。在 DOTA、RSOD 和 UCAS-AOD 数据集上的实验表明，我们的方法的平均精度（mAP）值分别达到了 0.728、0.952 和 0.945，与现有的最先进方法相当。