用于使用无人机图像进行多目标检测与识别的集成神经网络框架。
Integrated neural network framework for multi-object detection and recognition using UAV imagery.
作者信息
Alshehri Mohammed, Xue Tingting, Mujtaba Ghulam, AlQahtani Yahya, Almujally Nouf Abdullah, Jalal Ahmad, Liu Hui
机构信息
Department of Computer Science, King Khalid University, Abha, Saudi Arabia.
School of Environmental Science & Engineering, Nanjing University of Information Science and technology, Nanjing, China.
出版信息
Front Neurorobot. 2025 Jul 30;19:1643011. doi: 10.3389/fnbot.2025.1643011. eCollection 2025.
INTRODUCTION
Accurate vehicle analysis from aerial imagery has become increasingly vital for emerging technologies and public service applications such as intelligent traffic management, urban planning, autonomous navigation, and military surveillance. However, analyzing UAV-captured video poses several inherent challenges, such as the small size of target vehicles, occlusions, cluttered urban backgrounds, motion blur, and fluctuating lighting conditions which hinder the accuracy and consistency of conventional perception systems. To address these complexities, our research proposes a fully end-to-end deep learning-driven perception pipeline specifically optimized for UAV-based traffic monitoring. The proposed framwork integrates multiple advanced modules: RetinexNet for preprocessing, segmentation using HRNet to preserve high-resolution semantic information, and vehicle detection using the YOLOv11 framework. Deep SORT is employed for efficient vehicle tracking, while CSRNet facilitates high-density vehicle counting. LSTM networks are integrated to predict vehicle trajectories based on temporal patterns, and a combination of DenseNet and SuperPoint is utilized for robust feature extraction. Finally, classification is performed using Vision Transformers (ViTs), leveraging attention mechanisms to ensure accurate recognition across diverse categories. The modular yet unified architecture is designed to handle spatiotemporal dynamics, making it suitable for real-time deployment in diverse UAV platforms.
METHOD
The framework suggests using today's best neural networks that are made to solve different problems in aerial vehicle analysis. RetinexNet is used in preprocessing to make the lighting of each input frame consistent. Using HRNet for semantic segmentation allows for accurate splitting between vehicles and their surroundings. YOLOv11 provides high precision and quick vehicle detection and Deep SORT allows reliable tracking without losing track of individual cars. CSRNet are used for vehicle counting that is unaffected by obstacles or traffic jams. LSTM models capture how a car moves in time to forecast future positions. Combining DenseNet and SuperPoint embeddings that were improved with an AutoEncoder is done during feature extraction. In the end, using an attention function, Vision Transformer-based models classify vehicles seen from above. Every part of the system is developed and included to give the improved performance when the UAV is being used in real life.
RESULTS
Our proposed framework significantly improves the accuracy, reliability, and efficiency of vehicle analysis from UAV imagery. Our pipeline was rigorously evaluated on two famous datasets, AU-AIR and Roundabout. On the AU-AIR dataset, the system achieved a detection accuracy of 97.8%, a tracking accuracy of 96.5%, and a classification accuracy of 98.4%. Similarly, on the Roundabout dataset, it reached 96.9% detection accuracy, 94.4% tracking accuracy, and 97.7% classification accuracy. These results surpass previous benchmarks, demonstrating the system's robust performance across diverse aerial traffic scenarios. The integration of advanced models, YOLOv11 for detection, HRNet for segmentation, Deep SORT for tracking, CSRNet for counting, LSTM for trajectory prediction, and Vision Transformers for classification enables the framework to maintain high accuracy even under challenging conditions like occlusion, variable lighting, and scale variations.
DISCUSSION
The outcomes show that the chosen deep learning system is powerful enough to deal with the challenges of aerial vehicle analysis and gives reliable and precise results in all the aforementioned tasks. Combining several advanced models ensures that the system works smoothly even when dealing with problems like people being covered up and varying sizes.
引言
对于智能交通管理、城市规划、自主导航和军事监视等新兴技术及公共服务应用而言,从航空图像中进行精确的车辆分析变得愈发重要。然而,分析无人机拍摄的视频存在若干固有挑战,例如目标车辆尺寸小、遮挡、城市背景杂乱、运动模糊以及光照条件波动,这些都阻碍了传统感知系统的准确性和一致性。为应对这些复杂情况,我们的研究提出了一种完全端到端的深度学习驱动的感知管道,专门针对基于无人机的交通监测进行了优化。所提出的框架集成了多个先进模块:用于预处理的RetinexNet、使用HRNet进行分割以保留高分辨率语义信息、使用YOLOv11框架进行车辆检测。采用Deep SORT进行高效的车辆跟踪,而CSRNet有助于进行高密度车辆计数。集成LSTM网络以基于时间模式预测车辆轨迹,并利用DenseNet和SuperPoint的组合进行强大的特征提取。最后,使用视觉Transformer(ViT)进行分类,利用注意力机制确保跨不同类别进行准确识别。模块化但统一的架构旨在处理时空动态,使其适合在各种无人机平台上进行实时部署。
方法
该框架建议使用当今最好的神经网络来解决航空车辆分析中的不同问题。RetinexNet用于预处理,以使每个输入帧的光照一致。使用HRNet进行语义分割可实现车辆与其周围环境之间的准确分割。YOLOv11提供高精度和快速的车辆检测,而Deep SORT允许可靠跟踪而不会丢失单个车辆的轨迹。CSRNet用于车辆计数,不受障碍物或交通堵塞的影响。LSTM模型捕捉汽车如何随时间移动以预测未来位置。在特征提取过程中,将经过自动编码器改进的DenseNet和SuperPoint嵌入相结合。最后,使用注意力函数,基于视觉Transformer的模型对从上方看到的车辆进行分类。系统的每个部分都经过开发并包含在内,以便在无人机实际使用时提供更高的性能。
结果
我们提出的框架显著提高了从无人机图像中进行车辆分析的准确性、可靠性和效率。我们的管道在两个著名的数据集AU - AIR和Roundabout上进行了严格评估。在AU - AIR数据集上,该系统实现了97.8%的检测准确率、96.5%的跟踪准确率和98.4%的分类准确率。同样,在Roundabout数据集上,它达到了96.9%的检测准确率、94.4%的跟踪准确率和97.7%的分类准确率。这些结果超过了先前的基准,证明了该系统在各种空中交通场景中的强大性能。先进模型的集成,即用于检测的YOLOv11、用于分割的HRNet、用于跟踪的Deep SORT、用于计数的CSRNet、用于轨迹预测的LSTM以及用于分类的视觉Transformer,使该框架即使在遮挡、光照变化和尺度变化等具有挑战性的条件下也能保持高精度。
讨论
结果表明,所选的深度学习系统强大到足以应对航空车辆分析的挑战,并在上述所有任务中给出可靠且精确的结果。结合多个先进模型可确保系统即使在处理人员被遮挡和尺寸变化等问题时也能平稳运行。
相似文献
Front Neurorobot. 2025-7-30
2025-1
PeerJ Comput Sci. 2025-5-1
Sci Rep. 2025-7-2
Cochrane Database Syst Rev. 2022-5-20
本文引用的文献
Front Neurorobot. 2025-4-17
Sensors (Basel). 2023-5-17
IEEE Trans Neural Netw Learn Syst. 2022-11
Comput Methods Programs Biomed. 2020-4