基于策略迭代和神经网络的未知约束输入系统自适应最优控制。

Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.

出版信息

IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.

DOI:10.1109/TNNLS.2013.2276571

Abstract

This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems. The proposed PI algorithm is implemented on an actor-critic structure where two neural networks (NNs) are tuned online and simultaneously to generate the optimal bounded control policy. The requirement of complete knowledge of the system dynamics is obviated by employing a novel NN identifier in conjunction with the actor and critic NNs. It is shown how the identifier weights estimation error affects the convergence of the critic NN. A novel learning rule is developed to guarantee that the identifier weights converge to small neighborhoods of their ideal values exponentially fast. To provide an easy-to-check persistence of excitation condition, the experience replay technique is used. That is, recorded past experiences are used simultaneously with current data for the adaptation of the identifier weights. Stability of the whole system consisting of the actor, critic, system state, and system identifier is guaranteed while all three networks undergo adaptation. Convergence to a near-optimal control law is also shown. The effectiveness of the proposed method is illustrated with a simulation example.

摘要

本文提出了一种在线策略迭代（PI）算法，用于学习未知约束输入系统的连续时间最优控制解决方案。所提出的 PI 算法是在一个演员-评论家结构上实现的，其中两个神经网络（NN）在线和同时进行调整，以生成最优的有界控制策略。通过结合演员和评论家神经网络使用新颖的 NN 标识符，避免了对系统动力学完全了解的要求。展示了标识符权重估计误差如何影响评论家 NN 的收敛性。开发了一种新的学习规则，以保证标识符权重以指数速度快速收敛到其理想值的小邻域。为了提供易于检查的持续激励条件，使用了经验重放技术。也就是说，同时使用当前数据和过去记录的经验来适应标识符权重。在所有三个网络进行自适应的同时，保证了由演员、评论家、系统状态和系统标识符组成的整个系统的稳定性。还证明了收敛到接近最优控制律。通过一个仿真示例说明了所提出方法的有效性。

相似文献

Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.

IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.

Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems.

IEEE Trans Cybern. 2014 Dec;44(12):2706-18. doi: 10.1109/TCYB.2014.2313915. Epub 2014 Jul 28.

A policy iteration approach to online optimal control of continuous-time constrained-input systems.

ISA Trans. 2013 Sep;52(5):611-21. doi: 10.1016/j.isatra.2013.04.004. Epub 2013 May 24.

Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):994-1001. doi: 10.1109/TSMCB.2008.926607.

Reinforcement-learning-based output-feedback control of nonstrict nonlinear discrete-time systems with application to engine emission control.

IEEE Trans Syst Man Cybern B Cybern. 2009 Oct;39(5):1162-79. doi: 10.1109/TSMCB.2009.2013272. Epub 2009 Mar 24.

Adaptive near-optimal neuro controller for continuous-time nonaffine nonlinear systems with constrained input.

Neural Netw. 2017 Sep;93:195-204. doi: 10.1016/j.neunet.2017.05.013. Epub 2017 Jun 21.

Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems.

IEEE Trans Neural Netw Learn Syst. 2015 Jan;26(1):140-51. doi: 10.1109/TNNLS.2014.2358227. Epub 2014 Oct 8.

Reinforcement-learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation.

IEEE Trans Neural Netw. 2008 Aug;19(8):1369-88. doi: 10.1109/TNN.2008.2000452.

Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems.

Neural Netw. 2009 Apr;22(3):237-46. doi: 10.1016/j.neunet.2009.03.008. Epub 2009 Mar 26.

Multiple actor-critic structures for continuous-time optimal control using input-output data.

IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):851-65. doi: 10.1109/TNNLS.2015.2399020. Epub 2015 Feb 26.

引用本文的文献

RBS and ABS Coordinated Control Strategy Based on Explicit Model Predictive Control.

Sensors (Basel). 2024 May 12;24(10):3076. doi: 10.3390/s24103076.

Safe Model-Based Reinforcement Learning for Systems With Parametric Uncertainties.

Front Robot AI. 2021 Dec 16;8:733104. doi: 10.3389/frobt.2021.733104. eCollection 2021.

AF-DHNN: Fuzzy Clustering and Inference-Based Node Fault Diagnosis Method for Fire Detection.

Sensors (Basel). 2015 Jul 17;15(7):17366-96. doi: 10.3390/s150717366.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于策略迭代和神经网络的未知约束输入系统自适应最优控制。

Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献