Yang Qinmin, Jagannathan Sarangapani
State Key Laboratory of Industrial Control Technology, Department of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China.
IEEE Trans Syst Man Cybern B Cybern. 2012 Apr;42(2):377-90. doi: 10.1109/TSMCB.2011.2166384. Epub 2011 Sep 23.
In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.
本文针对存在有界干扰的一般多输入多输出仿射未知非线性离散时间系统,利用在线逼近器(OLAs)提出了基于强化学习状态反馈和输出反馈的自适应评判器控制器设计方法。所提出的控制器设计有两个部分,一个是设计用于产生最优信号的动作网络,另一个是评估动作网络性能的评判网络。评判器估计代价函数,该函数使用从启发式动态规划导出的递归方程进行在线调整。这里,动作网络和评判网络都使用神经网络(NNs),而任何在线逼近器,如径向基函数、样条函数、模糊逻辑等都可以使用。对于输出反馈情况,额外指定一个神经网络作为观测器来估计不可用的系统状态,因此不需要分离原理。同时利用李雅普诺夫理论推导了控制器方案的神经网络权重调整律,以确保闭环系统的一致最终有界性。最后,在单摆平衡系统和双连杆机器人手臂系统上进行了仿真测试,验证了这两种控制器的有效性。