未知户外环境中自主飞行机器人端到端局部运动规划的深度强化学习：实时飞行实验

Deep Reinforcement Learning for End-to-End Local Motion Planning of Autonomous Aerial Robots in Unknown Outdoor Environments: Real-Time Flight Experiments.

作者信息

Doukhi Oualid, Lee Deok-Jin

机构信息

Center for Artificial Intelligence & Autonomous Systems, Kunsan National University, 558 Daehak-ro, Naun 2(i)-dong, Gunsan 54150, Jeollabuk-do, Korea.

School of Mechanical Design Engineering, Smart e-Mobilty Lab, Center for Artificial Intelligence & Autonomous Systems, Jeonbuk National University, 567, Baekje-daero, Deokjin-gu, Jeonju-si 54896, Jeollabuk-do, Korea.

出版信息

Sensors (Basel). 2021 Apr 4;21(7):2534. doi: 10.3390/s21072534.

DOI:10.3390/s21072534

PMID:33916624

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8038595/

Abstract

Autonomous navigation and collision avoidance missions represent a significant challenge for robotics systems as they generally operate in dynamic environments that require a high level of autonomy and flexible decision-making capabilities. This challenge becomes more applicable in micro aerial vehicles (MAVs) due to their limited size and computational power. This paper presents a novel approach for enabling a micro aerial vehicle system equipped with a laser range finder to autonomously navigate among obstacles and achieve a user-specified goal location in a GPS-denied environment, without the need for mapping or path planning. The proposed system uses an actor-critic-based reinforcement learning technique to train the aerial robot in a Gazebo simulator to perform a point-goal navigation task by directly mapping the noisy MAV's state and laser scan measurements to continuous motion control. The obtained policy can perform collision-free flight in the real world while being trained entirely on a 3D simulator. Intensive simulations and real-time experiments were conducted and compared with a nonlinear model predictive control technique to show the generalization capabilities to new unseen environments, and robustness against localization noise. The obtained results demonstrate our system's effectiveness in flying safely and reaching the desired points by planning smooth forward linear velocity and heading rates.

摘要

自主导航和避碰任务对机器人系统来说是一项重大挑战，因为它们通常在需要高度自主性和灵活决策能力的动态环境中运行。由于其尺寸和计算能力有限，这一挑战在微型飞行器（MAV）中更为突出。本文提出了一种新颖的方法，使配备激光测距仪的微型飞行器系统能够在无GPS环境中在障碍物之间自主导航并到达用户指定的目标位置，而无需地图绘制或路径规划。所提出的系统使用基于演员-评论家的强化学习技术，在Gazebo模拟器中训练空中机器人，通过直接将有噪声的微型飞行器状态和激光扫描测量映射到连续运动控制来执行点目标导航任务。所获得的策略能够在现实世界中执行无碰撞飞行，同时完全在三维模拟器上进行训练。进行了大量模拟和实时实验，并与非线性模型预测控制技术进行了比较，以展示其对新的未知环境的泛化能力以及对定位噪声的鲁棒性。所获得的结果证明了我们的系统通过规划平滑的前向线速度和航向速率，在安全飞行和到达期望点方面的有效性。