School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China.
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China.
Neural Netw. 2021 Dec;144:176-186. doi: 10.1016/j.neunet.2021.08.025. Epub 2021 Aug 28.
A data-based value iteration algorithm with the bidirectional approximation feature is developed for discounted optimal control. The unknown nonlinear system dynamics is first identified by establishing a model neural network. To improve the identification precision, biases are introduced to the model network. The model network with biases is trained by the gradient descent algorithm, where the weights and biases across all layers are updated. The uniform ultimate boundedness stability with a proper learning rate is analyzed, by using the Lyapunov approach. Moreover, an integrated value iteration with the discounted cost is developed to fully guarantee the approximation accuracy of the optimal value function. Then, the effectiveness of the proposed algorithm is demonstrated by carrying out two simulation examples with physical backgrounds.
针对折扣最优控制问题,提出了一种具有双向逼近特性的数据基值迭代算法。首先,通过建立模型神经网络来识别未知的非线性系统动态。为了提高识别精度,在模型网络中引入了偏差。通过梯度下降算法对具有偏差的模型网络进行训练,其中更新所有层的权重和偏差。利用 Lyapunov 方法分析了具有适当学习率的一致最终有界稳定性。此外,还开发了具有折扣成本的综合值迭代算法,以充分保证最优值函数的逼近精度。然后,通过两个具有物理背景的仿真示例验证了所提出算法的有效性。