Mahmud S M Nahid, Nivison Scott A, Bell Zachary I, Kamalapurkar Rushikesh
School of Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, OK, United States.
Munitions Directorate, Air Force Research Laboratory, Eglin AFB, FL, United States.
Front Robot AI. 2021 Dec 16;8:733104. doi: 10.3389/frobt.2021.733104. eCollection 2021.
Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems.
在过去十年中,强化学习已成为为动态系统寻找最优控制策略的有效工具,近期的重点是确保在学习和/或执行阶段安全的方法。一般来说,当系统对安全至关重要且/或任务重启在实际中不可行时,安全保证在强化学习中至关重要。在最优控制理论中,安全要求通常根据状态和/或控制约束来表达。近年来,依赖持续激励的强化学习方法已与障碍变换相结合,以在状态约束下学习最优控制策略。为了放宽激励要求,依赖精确模型知识的基于模型的强化学习方法也已与障碍变换框架相结合。本文的目标是为具有模型参数不确定性的确定性非线性系统开发安全强化学习方法,以在不依赖严格激励条件的情况下学习近似约束最优策略。为此,本文开发了一种基于模型的强化学习技术,该技术利用一种新颖的滤波并发学习方法以及障碍变换,以实现对安全关键系统未知模型参数和近似最优状态约束控制策略的同时学习。