Zhou Yuxiang, Shu Jiansheng, Zheng Xiaolong, Hao Hui, Song Huan
Xi'an Research Institute of High Technology, Xi'an, China.
Front Neurorobot. 2022 Dec 5;16:1025817. doi: 10.3389/fnbot.2022.1025817. eCollection 2022.
With the application and development of UAV technology and navigation and positioning technology, higher requirements are put forward for UAV maneuvering obstacle avoidance ability and real-time route planning. In this paper, for the problem of real-time UAV route planning in the unknown environment, we combine the ideas of artificial potential field method to modify the state observation and reward function, which solves the problem of sparse rewards of reinforcement learning algorithm, improves the convergence speed of the algorithm, and improves the generalization of the algorithm by step-by-step training based on the ideas of curriculum learning and transfer learning according to the difficulty of the task. The simulation results show that the improved SAC algorithm has fast convergence speed, good timeliness and strong generalization, and can better complete the UAV route planning task.
随着无人机技术以及导航定位技术的应用与发展,对无人机的机动避障能力和实时路径规划提出了更高的要求。针对未知环境下无人机实时路径规划问题,本文结合人工势场法的思想对状态观测和奖励函数进行修正,解决了强化学习算法奖励稀疏的问题,提高了算法的收敛速度,并根据任务难度基于课程学习和迁移学习的思想进行逐步训练,提升了算法的泛化能力。仿真结果表明,改进后的SAC算法收敛速度快、时效性好、泛化能力强,能够较好地完成无人机路径规划任务。