一种基于深度语义融合的无人机图像轻量级小目标检测模型。

A lightweight small object detection model for UAV images based on deep semantic integration.

作者信息

Chao Manxin, Peng Can, Yun Lijun, Zhang Chunjie, Wang Huihua, Chen Zaiqing

机构信息

The School of Information, Yunnan Normal University, Kunming, 650500, Yunnan, China.

Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education of Yunnan Province, Kunming, 650500, Yunnan, China.

出版信息

Sci Rep. 2025 Aug 29;15(1):31888. doi: 10.1038/s41598-025-16878-6.

DOI:10.1038/s41598-025-16878-6

PMID:40883434

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12397394/

Abstract

Most existing small object detection methods rely on residual blocks to process deep feature maps. However, these residual blocks, composed of multiple large-kernel convolution layers, incur high computational costs and contain redundant information, which makes it difficult to improve detection performance for small objects. To address this, we designed an improved feature pyramid network called L Feature Pyramid Network (L-FPN), which optimizes the allocation of computational resources for small object detection by reconstructing the original FPN structure. Based on L-FPN, we further proposed a small object detector named BPD-YOLO. We introduce a Dual-phase Asymptotic Feature Fusion mechanism (DAFF), where the shallow and deep semantic features extracted from the backbone network are initially fused in parallel to mitigate the semantic gap. Subsequently, the intermediate semantic layers are progressively integrated, enabling effective fusion of both shallow and deep feature representations. Additionally, we designed the Deep Spatial Pyramid Fusion module (DSPF), which generates multi-scale feature representations as an alternative to conventional residual block stacking, thereby reducing computational overhead. In the shallow feature extraction stage, DSPF focuses on semantic integration and enhances the extraction of small object features. This strategy, which adaptively selects different modules based on the resolution of the feature maps, is referred to as the Decoupled feature Extraction-semantic Integration mechanism (DEI). Finally, we conducted extensive experiments and thorough evaluations on both the VisDrone and TinyPerson datasets. The results demonstrate that, on the VisDrone dataset, compared to the baseline model YOLOv8n + p2, our BPD-YOLO model with L-FPN achieves a 2.8% improvement in mAP50 and a 1.4% increase in mAP50-95. On the TinyPerson dataset, BPD-YOLO further demonstrates its superiority in high-resolution feature extraction, effectively enhancing detection accuracy while significantly reducing computational costs.

摘要

大多数现有的小目标检测方法依靠残差块来处理深度特征图。然而，这些由多个大内核卷积层组成的残差块会带来高昂的计算成本，并且包含冗余信息，这使得提高小目标的检测性能变得困难。为了解决这个问题，我们设计了一种改进的特征金字塔网络，称为L特征金字塔网络（L-FPN），它通过重构原始的FPN结构来优化小目标检测的计算资源分配。基于L-FPN，我们进一步提出了一种小目标检测器，名为BPD-YOLO。我们引入了一种双阶段渐近特征融合机制（DAFF），其中从骨干网络提取的浅层和深层语义特征首先并行融合，以减轻语义差距。随后，中间语义层逐步整合，实现浅层和深层特征表示的有效融合。此外，我们设计了深度空间金字塔融合模块（DSPF），它生成多尺度特征表示，以替代传统的残差块堆叠，从而减少计算开销。在浅层特征提取阶段，DSPF专注于语义整合，并增强小目标特征的提取。这种基于特征图分辨率自适应选择不同模块的策略，称为解耦特征提取-语义整合机制（DEI）。最后，我们在VisDrone和TinyPerson数据集上进行了广泛的实验和全面的评估。结果表明，在VisDrone数据集上，与基线模型YOLOv8n + p2相比，我们带有L-FPN的BPD-YOLO模型在mAP50上提高了2.8%，在mAP50-95上提高了1.4%。在TinyPerson数据集上，BPD-YOLO进一步展示了其在高分辨率特征提取方面的优势，有效地提高了检测精度，同时显著降低了计算成本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2b5/12397394/b9367220623c/41598_2025_16878_Figa_HTML.jpg

相似文献

A lightweight small object detection model for UAV images based on deep semantic integration.

Sci Rep. 2025 Aug 29;15(1):31888. doi: 10.1038/s41598-025-16878-6.

UAV-DETR: An Enhanced RT-DETR Architecture for Efficient Small Object Detection in UAV Imagery.

Sensors (Basel). 2025 Jul 24;25(15):4582. doi: 10.3390/s25154582.

Partial feature reparameterization and shallow-level interaction for remote sensing object detection.

Sci Rep. 2025 Aug 5;15(1):28629. doi: 10.1038/s41598-025-14035-7.

PCPE-YOLO with a lightweight dynamically reconfigurable backbone for small object detection.

Sci Rep. 2025 Aug 16;15(1):29988. doi: 10.1038/s41598-025-15975-w.

Integrated neural network framework for multi-object detection and recognition using UAV imagery.

Front Neurorobot. 2025 Jul 30;19:1643011. doi: 10.3389/fnbot.2025.1643011. eCollection 2025.

I-YOLOv11n: A Lightweight and Efficient Small Target Detection Framework for UAV Aerial Images.

Sensors (Basel). 2025 Aug 7;25(15):4857. doi: 10.3390/s25154857.

An improved YOLOv8s-based UAV target detection algorithm.

PLoS One. 2025 Aug 21;20(8):e0327732. doi: 10.1371/journal.pone.0327732. eCollection 2025.

Detection of Surface Defects in Steel Based on Dual-Backbone Network: MBDNet-Attention-YOLO.

Sensors (Basel). 2025 Aug 5;25(15):4817. doi: 10.3390/s25154817.

LRDS-YOLO enhances small object detection in UAV aerial images with a lightweight and efficient design.

Sci Rep. 2025 Jul 2;15(1):22627. doi: 10.1038/s41598-025-07021-6.

DASNet a dual branch multi level attention sheep counting network.

Sci Rep. 2025 Jul 2;15(1):23228. doi: 10.1038/s41598-025-97929-w.

本文引用的文献

Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases.

Comput Biol Med. 2024 Mar;170:107917. doi: 10.1016/j.compbiomed.2024.107917. Epub 2024 Jan 6.

UIU-Net: U-Net in U-Net for Infrared Small Object Detection.

IEEE Trans Image Process. 2023;32:364-376. doi: 10.1109/TIP.2022.3228497. Epub 2022 Dec 21.

Detection and Tracking Meet Drones Challenge.

IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):7380-7399. doi: 10.1109/TPAMI.2021.3119563. Epub 2022 Oct 4.

Deep High-Resolution Representation Learning for Visual Recognition.

IEEE Trans Pattern Anal Mach Intell. 2021 Oct;43(10):3349-3364. doi: 10.1109/TPAMI.2020.2983686. Epub 2021 Sep 2.

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation.

IEEE Trans Med Imaging. 2020 Jun;39(6):1856-1867. doi: 10.1109/TMI.2019.2959609. Epub 2019 Dec 13.

Focal Loss for Dense Object Detection.

IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318-327. doi: 10.1109/TPAMI.2018.2858826. Epub 2018 Jul 23.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.

IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16. doi: 10.1109/TPAMI.2015.2389824.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于深度语义融合的无人机图像轻量级小目标检测模型。

A lightweight small object detection model for UAV images based on deep semantic integration.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献