Chen YuZhi, Sun HaoYue, Tian Liang, Yang Ye, Wang ShenYang, Wang TianYou
Hebei University of Architecture, Zhangjiakou, China.
Hebei Digital Education Collaborative Innovation, Shijiazhuang, China.
PLoS One. 2025 Aug 18;20(8):e0330074. doi: 10.1371/journal.pone.0330074. eCollection 2025.
Infrared unmanned aerial vehicle (UAV) detection for surveillance applications faces three conflicting requirements: accurate detection of pixel-level thermal signatures, real-time processing capabilities, and deployment feasibility on resource-constrained edge devices. Current deep learning approaches typically optimize for one or two of these objectives while compromising the third.
This paper presents YOLO11-AU-IR, a lightweight instance segmentation framework that addresses these challenges through three architectural innovations. First, Efficient Adaptive Downsampling (EADown) employs dual-branch processing with grouped convolutions to preserve small-target spatial features during multi-scale fusion. Second, HeteroScale Attention Network (HSAN) implements grouped multi-scale convolutions with joint channel-spatial attention mechanisms for enhanced cross-scale feature representation. These architectural optimizations collectively reduce computational requirements while maintaining detection accuracy. Third, Adaptive Threshold Focal Loss (ATFL) introduces epoch-adaptive parameter tuning to address the extreme foreground-background imbalance inherent in infrared UAV imagery.
YOLO11-AU-IR is evaluated on the AUVD-Seg300 dataset, achieving 97.7% mAP@0.50 and 75.2% mAP@0.50:0.95, surpassing the YOLO11n-seg baseline by 1.7% and 4.4%, respectively. The model reduces parameters by 24.5% and GFLOPs by 11.8% compared to YOLO11n-seg, while maintaining real-time inference at 59.8 FPS on an NVIDIA RTX 3090 with low variance. On the NVIDIA Jetson TX2, under INT8 CPU-only deployment, YOLO11-AU-IR retains 95% mAP@0.50 with minimal memory footprint and stable performance, demonstrating its practical edge compatibility. Ablation studies further confirm the complementary contributions of EADown, HSAN, and ATFL in enhancing accuracy, robustness, and efficiency. Code and dataset are publicly available at https://github.com/chen-yuzhi/YOLO11-AU-IR.
用于监视应用的红外无人机(UAV)检测面临三个相互冲突的要求:精确检测像素级热特征、实时处理能力以及在资源受限的边缘设备上的部署可行性。当前的深度学习方法通常针对其中一两个目标进行优化,而牺牲了第三个目标。
本文提出了YOLO11-AU-IR,这是一个轻量级实例分割框架,通过三项架构创新来应对这些挑战。首先,高效自适应下采样(EADown)采用带有分组卷积的双分支处理,以在多尺度融合期间保留小目标空间特征。其次,异尺度注意力网络(HSAN)通过联合通道-空间注意力机制实现分组多尺度卷积,以增强跨尺度特征表示。这些架构优化共同降低了计算需求,同时保持检测精度。第三,自适应阈值焦点损失(ATFL)引入了epoch自适应参数调整,以解决红外无人机图像中固有的极端前景-背景不平衡问题。
YOLO11-AU-IR在AUVD-Seg300数据集上进行评估,在mAP@0.50时达到97.7%,在mAP@0.50:0.95时达到75.2%,分别比YOLO11n-seg基线高出1.7%和4.4%。与YOLO11n-seg相比,该模型的参数减少了24.5%,GFLOP减少了11.8%,同时在NVIDIA RTX 3090上以59.8 FPS保持实时推理,方差较低。在NVIDIA Jetson TX2上,在仅INT8 CPU部署下,YOLO11-AU-IR以最小的内存占用和稳定的性能保持了95%的mAP@0.50,证明了其实际的边缘兼容性。消融研究进一步证实了EADown、HSAN和ATFL在提高准确性、鲁棒性和效率方面的互补作用。代码和数据集可在https://github.com/chen-yuzhi/YOLO11-AU-IR上公开获取。