Yang Yongliang, Vamvoudakis Kyriakos G, Modares Hamidreza, Yin Yixin, Wunsch Donald C
IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5441-5455. doi: 10.1109/TNNLS.2020.2967871. Epub 2020 Nov 30.
In this article, we present an intermittent framework for safe reinforcement learning (RL) algorithms. First, we develop a barrier function-based system transformation to impose state constraints while converting the original problem to an unconstrained optimization problem. Second, based on optimal derived policies, two types of intermittent feedback RL algorithms are presented, namely, a static and a dynamic one. We finally leverage an actor/critic structure to solve the problem online while guaranteeing optimality, stability, and safety. Simulation results show the efficacy of the proposed approach.
在本文中,我们提出了一种用于安全强化学习(RL)算法的间歇框架。首先,我们开发了一种基于障碍函数的系统变换,以施加状态约束,同时将原始问题转化为无约束优化问题。其次,基于最优导出策略,提出了两种类型的间歇反馈RL算法,即静态算法和动态算法。我们最终利用一个演员/评论家结构在线解决该问题,同时保证最优性、稳定性和安全性。仿真结果表明了所提方法的有效性。