Maitre Guillaume, Martinot Dimitri, Tuci Elio
Faculty of Computer Science, University of Namur, Namur, Belgium.
Qualitics SRL, Ans, Belgium.
Front Robot AI. 2024 Apr 26;11:1378149. doi: 10.3389/frobt.2024.1378149. eCollection 2024.
This paper focuses on the design of Convolution Neural Networks to visually guide an autonomous Unmanned Aerial Vehicle required to inspect power towers. The network is required to precisely segment images taken by a camera mounted on a UAV in order to allow a motion module to generate collision-free and inspection-relevant manoeuvres of the UAV along different types of towers. The images segmentation process is particularly challenging not only because of the different structures of the towers but also because of the enormous variability of the background, which can vary from the uniform blue of the sky to the multi-colour complexity of a rural, forest, or urban area. To be able to train networks that are robust enough to deal with the task variability, without incurring into a labour-intensive and costly annotation process of physical-world images, we have carried out a comparative study in which we evaluate the performances of networks trained either with synthetic images (i.e., the synthetic dataset), physical-world images (i.e., the physical-world dataset), or a combination of these two types of images (i.e., the hybrid dataset). The network used is an attention-based U-NET. The synthetic images are created using photogrammetry, to accurately model power towers, and simulated environments modelling a UAV during inspection of different power towers in different settings. Our findings reveal that the network trained on the hybrid dataset outperforms the networks trained with the synthetic and the physical-world image datasets. Most notably, the networks trained with the hybrid dataset demonstrates a superior performance on multiples evaluation metrics related to the image-segmentation task. This suggests that, the combination of synthetic and physical-world images represents the best trade-off to minimise the costs related to capturing and annotating physical-world images, and to maximise the task performances. Moreover, the results of our study demonstrate the potential of photogrammetry in creating effective training datasets to design networks to automate the precise movement of visually-guided UAVs.
本文聚焦于卷积神经网络的设计,以视觉引导用于检查输电塔的自主无人机。该网络需要精确分割安装在无人机上的摄像头拍摄的图像,以便运动模块能够沿着不同类型的塔生成无人机无碰撞且与检查相关的机动动作。图像分割过程极具挑战性,不仅因为塔的结构各异,还因为背景的巨大变异性,其可以从天空的均匀蓝色变化到农村、森林或城市地区的多色复杂性。为了能够训练出足够强大以应对任务变异性的网络,同时又无需进行物理世界图像的劳动密集型且成本高昂的标注过程,我们进行了一项比较研究,在该研究中评估了用合成图像(即合成数据集)、物理世界图像(即物理世界数据集)或这两种类型图像的组合(即混合数据集)训练的网络的性能。所使用的网络是基于注意力的U-NET。合成图像是使用摄影测量法创建的,以精确建模输电塔,并模拟在不同设置下检查不同输电塔时无人机的环境。我们的研究结果表明,在混合数据集上训练的网络优于用合成图像和物理世界图像数据集训练的网络。最值得注意的是,在混合数据集上训练的网络在与图像分割任务相关的多个评估指标上表现出卓越的性能。这表明,合成图像和物理世界图像的组合是最佳的权衡,既能将与捕获和标注物理世界图像相关的成本降至最低,又能使任务性能最大化。此外,我们的研究结果证明了摄影测量法在创建有效的训练数据集以设计网络来自动化视觉引导无人机的精确移动方面的潜力。