Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China.
Neural Netw. 2019 Sep;117:1-7. doi: 10.1016/j.neunet.2019.04.026. Epub 2019 May 23.
A torsional pendulum device containing hyperbolic tangent input nonlinearities can be formulated as a nonaffine system. Unlike basic affine systems, the optimal feedback control of complex nonaffine plants is difficult but quite important. In this paper, the approximate optimal control design of continuous-time nonaffine nonlinear systems is investigated with the help of reinforcement learning. For addressing the learning algorithm conveniently, an effective pre-compensation technique is adopted to perform proper system transformation. Then, the integral policy iteration strategy is incorporated to relieve the demand of system dynamics. Moreover, the actor-critic structure is implemented by virtue of neural network approximators. Finally, the experimental verification for the proposed torsional pendulum plant is conducted after a learning process of 20 iterations and the stability performance with basic robustness guarantee can be observed during two case studies.
一个含有双曲正切输入非线性的扭转摆装置可以被表述为非仿射系统。与基本的仿射系统不同,复杂的非仿射植物的最优反馈控制是困难的,但却非常重要。本文借助强化学习研究了连续时间非仿射非线性系统的近似最优控制设计。为了方便地解决学习算法的问题,采用了一种有效的预补偿技术来进行适当的系统变换。然后,采用积分策略迭代策略来减轻对系统动态性的需求。此外,通过神经网络逼近器实现了演员-评论家结构。最后,在经过 20 次迭代的学习过程后,对所提出的扭转摆装置进行了实验验证,可以观察到在两个案例研究中具有基本鲁棒性保证的稳定性性能。