Xie Jingyi, Peng Xiaodong, Wang Haijiao, Niu Wenlong, Zheng Xiao
Key Laboratory of Electronics and Information Technology for Space System, National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China.
University of Chinese Academy of Sciences, Beijing 100049, China.
Sensors (Basel). 2020 Oct 1;20(19):5630. doi: 10.3390/s20195630.
Unmanned aerial vehicle (UAV) autonomous tracking and landing is playing an increasingly important role in military and civil applications. In particular, machine learning has been successfully introduced to robotics-related tasks. A novel UAV autonomous tracking and landing approach based on a deep reinforcement learning strategy is presented in this paper, with the aim of dealing with the UAV motion control problem in an unpredictable and harsh environment. Instead of building a prior model and inferring the landing actions based on heuristic rules, a model-free method based on a partially observable Markov decision process (POMDP) is proposed. In the POMDP model, the UAV automatically learns the landing maneuver by an end-to-end neural network, which combines the Deep Deterministic Policy Gradients (DDPG) algorithm and heuristic rules. A Modular Open Robots Simulation Engine (MORSE)-based reinforcement learning framework is designed and validated with a continuous UAV tracking and landing task on a randomly moving platform in high sensor noise and intermittent measurements. The simulation results show that when the moving platform is moving in different trajectories, the average landing success rate of the proposed algorithm is about 10% higher than that of the Proportional-Integral-Derivative (PID) method. As an indirect result, a state-of-the-art deep reinforcement learning-based UAV control method is validated, where the UAV can learn the optimal strategy of a continuously autonomous landing and perform properly in a simulation environment.
无人机自主跟踪与着陆在军事和民用应用中发挥着越来越重要的作用。特别是,机器学习已成功应用于与机器人相关的任务。本文提出了一种基于深度强化学习策略的新型无人机自主跟踪与着陆方法,旨在解决无人机在不可预测和恶劣环境中的运动控制问题。该方法不是构建先验模型并基于启发式规则推断着陆动作,而是提出了一种基于部分可观测马尔可夫决策过程(POMDP)的无模型方法。在POMDP模型中,无人机通过端到端神经网络自动学习着陆机动,该网络结合了深度确定性策略梯度(DDPG)算法和启发式规则。设计了一个基于模块化开放机器人仿真引擎(MORSE)的强化学习框架,并在高传感器噪声和间歇性测量的随机移动平台上,通过连续的无人机跟踪与着陆任务进行了验证。仿真结果表明,当移动平台沿不同轨迹移动时,所提算法的平均着陆成功率比比例积分微分(PID)方法高约10%。作为间接结果,验证了一种基于深度强化学习的先进无人机控制方法,该方法中无人机可以学习连续自主着陆的最优策略,并在仿真环境中正常运行。