利用深度学习以及低成本的 RGB 和热成像摄像机，检测多旋翼无人机航拍图像中的行人。

Using Deep Learning and Low-Cost RGB and Thermal Cameras to Detect Pedestrians in Aerial Images Captured by Multirotor UAV.

机构信息

Computing Systems Engineering Laboratory (LESC), Federal University of Technology-Parana (UTFPR), Curitiba 80230-901, Brazil.

出版信息

Sensors (Basel). 2018 Jul 12;18(7):2244. doi: 10.3390/s18072244.

DOI:10.3390/s18072244

PMID:30002290

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6068987/

Abstract

The use of Unmanned Aerial Vehicles (UAV) has been increasing over the last few years in many sorts of applications due mainly to the decreasing cost of this technology. One can see the use of the UAV in several civilian applications such as surveillance and search and rescue. Automatic detection of pedestrians in aerial images is a challenging task. The computing vision system must deal with many sources of variability in the aerial images captured with the UAV, e.g., low-resolution images of pedestrians, images captured at distinct angles due to the degrees of freedom that a UAV can move, the camera platform possibly experiencing some instability while the UAV flies, among others. In this work, we created and evaluated different implementations of Pattern Recognition Systems (PRS) aiming at the automatic detection of pedestrians in aerial images captured with multirotor UAV. The main goal is to assess the feasibility and suitability of distinct PRS implementations running on top of low-cost computing platforms, e.g., single-board computers such as the Raspberry Pi or regular laptops without a GPU. For that, we used four machine learning techniques in the feature extraction and classification steps, namely Haar cascade, LBP cascade, HOG + SVM and Convolutional Neural Networks (CNN). In order to improve the system performance (especially the processing time) and also to decrease the rate of false alarms, we applied the Saliency Map (SM) and Thermal Image Processing (TIP) within the segmentation and detection steps of the PRS. The classification results show the CNN to be the best technique with 99.7% accuracy, followed by HOG + SVM with 92.3%. In situations of partial occlusion, the CNN showed 71.1% sensitivity, which can be considered a good result in comparison with the current state-of-the-art, since part of the original image data is missing. As demonstrated in the experiments, by combining TIP with CNN, the PRS can process more than two frames per second (fps), whereas the PRS that combines TIP with HOG + SVM was able to process 100 fps. It is important to mention that our experiments show that a trade-off analysis must be performed during the design of a pedestrian detection PRS. The faster implementations lead to a decrease in the PRS accuracy. For instance, by using HOG + SVM with TIP, the PRS presented the best performance results, but the obtained accuracy was 35 percentage points lower than the CNN. The obtained results indicate that the best detection technique (i.e., the CNN) requires more computational resources to decrease the PRS computation time. Therefore, this work shows and discusses the pros/cons of each technique and trade-off situations, and hence, one can use such an analysis to improve and tailor the design of a PRS to detect pedestrians in aerial images.

摘要

近年来，由于这项技术成本的降低，无人机（UAV）在许多应用中的使用越来越多。人们可以看到无人机在一些民用应用中的使用，例如监控和搜索救援。自动检测空中图像中的行人是一项具有挑战性的任务。计算机视觉系统必须处理无人机拍摄的空中图像中许多来源的可变性，例如，行人的低分辨率图像，由于无人机的自由度而导致的不同角度拍摄的图像，飞行时相机平台可能不稳定，等等。在这项工作中，我们创建和评估了不同的模式识别系统（PRS）的实现，旨在自动检测使用多旋翼无人机拍摄的空中图像中的行人。主要目标是评估在低成本计算平台（例如单板计算机，如 Raspberry Pi 或不带 GPU 的普通笔记本电脑）上运行的不同 PRS 实现的可行性和适用性。为此，我们在特征提取和分类步骤中使用了四种机器学习技术，即 Haar 级联、LBP 级联、HOG + SVM 和卷积神经网络（CNN）。为了提高系统性能（特别是处理时间）并降低误报率，我们在 PRS 的分割和检测步骤中应用了显着图（SM）和热图像处理（TIP）。分类结果表明，CNN 以 99.7%的准确率成为最佳技术，其次是 HOG + SVM，准确率为 92.3%。在部分遮挡的情况下，CNN 的灵敏度为 71.1%，与当前的最新技术相比，这是一个很好的结果，因为原始图像数据的一部分丢失了。如实验所示，通过将 TIP 与 CNN 结合使用，PRS 每秒可以处理超过两帧（fps），而将 TIP 与 HOG + SVM 结合使用的 PRS 可以处理 100 fps。需要指出的是，我们的实验表明，在设计行人检测 PRS 时必须进行权衡分析。更快的实现会导致 PRS 精度降低。例如，使用 TIP 的 HOG + SVM，PRS 呈现出最佳的性能结果，但获得的准确率比 CNN 低 35 个百分点。所得结果表明，最佳检测技术（即 CNN）需要更多的计算资源来减少 PRS 的计算时间。因此，这项工作展示并讨论了每种技术的优缺点和权衡情况，因此，可以使用这种分析来改进和调整 PRS 的设计，以检测空中图像中的行人。