IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2154-2166. doi: 10.1109/TNNLS.2018.2808102.
An infinite-horizon optimal regulation problem for a control-affine deterministic system is solved online using a local state following (StaF) kernel and a regional model-based reinforcement learning (R-MBRL) method to approximate the value function. Unlike traditional methods such as R-MBRL that aim to approximate the value function over a large compact set, the StaF kernel approach aims to approximate the value function in a local neighborhood of the state that travels within a compact set. In this paper, the value function is approximated using a state-dependent convex combination of the StaF-based and the R-MBRL-based approximations. As the state enters a neighborhood containing the origin, the value function transitions from being approximated by the StaF approach to the R-MBRL approach. Semiglobal uniformly ultimately bounded (SGUUB) convergence of the system states to the origin is established using a Lyapunov-based analysis. Simulation results are provided for two, three, six, and ten-state dynamical systems to demonstrate the scalability and performance of the developed method.
使用局部状态跟踪(StaF)核和基于区域的强化学习(R-MBRL)方法,在线解决了一个控制仿射确定性系统的无限时域最优调节问题,以逼近值函数。与传统的 R-MBRL 等方法不同,后者旨在在紧凑集上逼近值函数,StaF 核方法旨在在紧凑集内的状态局部邻域内逼近值函数。在本文中,使用 StaF 基于和 R-MBRL 基于的逼近的状态相关凸组合来逼近值函数。随着状态进入包含原点的邻域,值函数从 StaF 方法过渡到 R-MBRL 方法。通过基于 Lyapunov 的分析,建立了系统状态到原点的半全局一致最终有界(SGUUB)收敛性。为两个、三个、六个和十个状态动力学系统提供了仿真结果,以展示所开发方法的可扩展性和性能。