IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5492-5503. doi: 10.1109/TNNLS.2021.3070852. Epub 2022 Oct 5.
This article develops an adaptive observation-based efficient reinforcement learning (RL) approach for systems with uncertain drift dynamics. A novel concurrent learning adaptive extended observer (CL-AEO) is first designed to jointly estimate the system state and parameter. This observer has a two-time-scale structure and does not require any additional numerical techniques to calculate the state derivative information. The idea of concurrent learning (CL) is leveraged to use the recorded data, which leads to a relaxed verifiable excitation condition for the convergence of parameter estimation. Based on the estimated state and parameter provided by the CL-AEO, a simulation of experience-based RL scheme is developed to online approximate the optimal control policy. Rigorous theoretical analysis is given to show that the practical convergence of the system state to the origin and the developed policy to the ideal optimal policy can be achieved without the persistence of excitation (PE) condition. Finally, the effectiveness and superiority of the developed methodology are demonstrated via comparative simulations.
本文为具有不确定漂移动态的系统开发了一种自适应观测的高效强化学习(RL)方法。首先设计了一种新颖的并发学习自适应扩展观测器(CL-AEO),用于联合估计系统状态和参数。该观测器具有双时间尺度结构,不需要任何额外的数值技术来计算状态导数信息。利用并发学习(CL)的思想,使用记录的数据,这为参数估计的收敛提供了一个宽松的可验证激励条件。基于 CL-AEO 提供的估计状态和参数,开发了基于模拟的经验 RL 方案,以在线近似最优控制策略。给出了严格的理论分析,证明了在没有持续激励(PE)条件的情况下,系统状态能够实际收敛到原点,所开发的策略能够收敛到理想最优策略。最后,通过比较仿真验证了所提出方法的有效性和优越性。