Gou Huabei, Guo Xiao, Lou Wenjie, Ou Jiajun, Yuan Jiace
School of Aeronautic Science and Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China.
Frontier Institute of Science and Technology Innovation, Beijing University of Aeronautics and Astronautics, Beijing 100191, China.
Sensors (Basel). 2020 Dec 15;20(24):7176. doi: 10.3390/s20247176.
This paper proposes a reinforcement learning (RL) based path following strategy for underactuated airships with magnitude and rate saturation. The Markov decision process (MDP) model for the control problem is established. Then an error bounded line-of-sight (LOS) guidance law is investigated to restrain the state space. Subsequently, a proximal policy optimization (PPO) algorithm is employed to approximate the optimal action policy through trial and error. Since the optimal action policy is generated from the action space, the magnitude and rate saturation can be avoided. The simulation results, involving circular, general, broken-line, and anti-wind path following tasks, demonstrate that the proposed control scheme can transfer to new tasks without adaptation, and possesses satisfying real-time performance and robustness.
本文针对具有幅值和速率饱和的欠驱动飞艇,提出了一种基于强化学习(RL)的路径跟踪策略。建立了控制问题的马尔可夫决策过程(MDP)模型。然后研究了一种误差有界视线(LOS)制导律来限制状态空间。随后,采用近端策略优化(PPO)算法通过试错来逼近最优动作策略。由于最优动作策略是从动作空间生成的,因此可以避免幅值和速率饱和。涉及圆形、一般、折线和抗风路径跟踪任务的仿真结果表明,所提出的控制方案无需调整即可转移到新任务中,并且具有令人满意的实时性能和鲁棒性。