深度强化学习方法与多经验池在复杂未知环境中无人机自主运动规划。

Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV's Autonomous Motion Planning in Complex Unknown Environments.

机构信息

School of Electronic and Information, Northwestern Polytechnical University, Xi'an 710129, China.

出版信息

Sensors (Basel). 2020 Mar 29;20(7):1890. doi: 10.3390/s20071890.

DOI:10.3390/s20071890

PMID:32235308

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7180781/

Abstract

Autonomous motion planning (AMP) of unmanned aerial vehicles (UAVs) is aimed at enabling a UAV to safely fly to the target without human intervention. Recently, several emerging deep reinforcement learning (DRL) methods have been employed to address the AMP problem in some simplified environments, and these methods have yielded good results. This paper proposes a multiple experience pools (MEPs) framework leveraging human expert experiences for DRL to speed up the learning process. Based on the deep deterministic policy gradient (DDPG) algorithm, a MEP-DDPG algorithm was designed using model predictive control and simulated annealing to generate expert experiences. On applying this algorithm to a complex unknown simulation environment constructed based on the parameters of the real UAV, the training experiment results showed that the novel DRL algorithm resulted in a performance improvement exceeding 20% as compared with the state-of-the-art DDPG. The results of the experimental testing indicate that UAVs trained using MEP-DDPG can stably complete a variety of tasks in complex, unknown environments.

摘要

自主运动规划 (AMP) 的无人飞行器 (UAV) 的目的是使无人机能够安全地飞往目标，而无需人为干预。最近，几种新兴的深度强化学习 (DRL) 方法已被用于解决一些简化环境中的 AMP 问题，并且这些方法取得了很好的效果。本文提出了一个利用人类专家经验的多经验池 (MEP) 框架用于 DRL，以加快学习过程。基于深度确定性策略梯度 (DDPG) 算法，设计了一个使用模型预测控制和模拟退火生成专家经验的 MEP-DDPG 算法。将该算法应用于基于真实 UAV 参数构建的复杂未知模拟环境中，训练实验结果表明，与最先进的 DDPG 相比，新型 DRL 算法的性能提高了 20%以上。实验测试的结果表明，使用 MEP-DDPG 训练的无人机可以在复杂、未知的环境中稳定地完成各种任务。