基于矢量化 IOU 的无人机空中场景目标检测。

Object Detection for UAV Aerial Scenarios Based on Vectorized IOU.

机构信息

College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China.

Bijie 5G Innovation and Application Research Institute, Guizhou University of Engineering Science, Bijie 551700, China.

出版信息

Sensors (Basel). 2023 Mar 13;23(6):3061. doi: 10.3390/s23063061.

DOI:10.3390/s23063061

PMID:36991772

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10054878/

Abstract

Object detection in unmanned aerial vehicle (UAV) images is an extremely challenging task and involves problems such as multi-scale objects, a high proportion of small objects, and high overlap between objects. To address these issues, first, we design a Vectorized Intersection Over Union (VIOU) loss based on YOLOv5s. This loss uses the width and height of the bounding box as a vector to construct a cosine function that corresponds to the size of the box and the aspect ratio and directly compares the center point value of the box to improve the accuracy of the bounding box regression. Second, we propose a Progressive Feature Fusion Network (PFFN) that addresses the issue of insufficient semantic extraction of shallow features by Panet. This allows each node of the network to fuse semantic information from deep layers with features from the current layer, thus significantly improving the detection ability of small objects in multi-scale scenes. Finally, we propose an Asymmetric Decoupled (AD) head, which separates the classification network from the regression network and improves the classification and regression capabilities of the network. Our proposed method results in significant improvements on two benchmark datasets compared to YOLOv5s. On the VisDrone 2019 dataset, the performance increased by 9.7% from 34.9% to 44.6%, and on the DOTA dataset, the performance increased by 2.1%.

摘要

目标检测：在无人机（UAV）图像中的目标检测是一项极具挑战性的任务，涉及到多尺度目标、高比例的小目标以及目标之间的高度重叠等问题。为了解决这些问题，我们首先基于 YOLOv5s 设计了一种基于向量的交并比损失（VIOU）。该损失使用边界框的宽度和高度作为向量构建余弦函数，对应于框的大小和纵横比，并直接比较框的中心点值，以提高边界框回归的准确性。其次，我们提出了一种渐进式特征融合网络（PFFN），解决了 Panet 中浅层特征语义提取不足的问题。这使得网络的每个节点都可以融合来自深层的语义信息与当前层的特征，从而显著提高多尺度场景中小目标的检测能力。最后，我们提出了一种非对称解耦（AD）头，它将分类网络与回归网络分离，提高了网络的分类和回归能力。与 YOLOv5s 相比，我们提出的方法在两个基准数据集上取得了显著的改进。在 VisDrone 2019 数据集上，性能从 34.9%提高到 44.6%，提高了 9.7%，在 DOTA 数据集上，性能提高了 2.1%。