Suppr超能文献

基于安全强化学习的约束离散时间非线性系统最优控制

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

作者信息

Zhang Lingzhi, Xie Lei, Jiang Yi, Li Zhishan, Liu Xueqin, Su Hongye

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):854-865. doi: 10.1109/TNNLS.2023.3326397. Epub 2025 Jan 7.

Abstract

The state and input constraints of nonlinear systems could greatly impede the realization of their optimal control when using reinforcement learning (RL)-based approaches since the commonly used quadratic utility functions cannot meet the requirements of solving constrained optimization problems. This article develops a novel optimal control approach for constrained discrete-time (DT) nonlinear systems based on safe RL. Specifically, a barrier function (BF) is introduced and incorporated with the value function to help transform a constrained optimization problem into an unconstrained one. Meanwhile, the minimum of such an optimization problem can be guaranteed to occur at the origin. Then a constrained policy iteration (PI) algorithm is developed to realize the optimal control of the nonlinear system and to enable the state and input constraints to be satisfied. The constrained optimal control policy and its corresponding value function are derived through the implementation of two neural networks (NNs). Performance analysis shows that the proposed control approach still retains the convergence and optimality properties of the traditional PI algorithm. Simulation results of three examples reveal its effectiveness.

摘要

当使用基于强化学习(RL)的方法时,非线性系统的状态和输入约束可能会极大地阻碍其最优控制的实现,因为常用的二次效用函数无法满足求解约束优化问题的要求。本文提出了一种基于安全强化学习的约束离散时间(DT)非线性系统的新型最优控制方法。具体而言,引入了一个障碍函数(BF)并将其与价值函数相结合,以帮助将约束优化问题转化为无约束优化问题。同时,可以保证这种优化问题的最小值出现在原点。然后,开发了一种约束策略迭代(PI)算法,以实现非线性系统的最优控制,并确保满足状态和输入约束。通过两个神经网络(NN)的实现,推导出了约束最优控制策略及其相应的价值函数。性能分析表明,所提出的控制方法仍然保留了传统PI算法的收敛性和最优性。三个例子的仿真结果证明了其有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验