Suppr超能文献

自适应动态规划算法求解非折扣最优控制问题的误差界。

Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems.

出版信息

IEEE Trans Neural Netw Learn Syst. 2015 Jun;26(6):1323-34. doi: 10.1109/TNNLS.2015.2402203. Epub 2015 Mar 3.

Abstract

In this paper, we establish error bounds of adaptive dynamic programming algorithms for solving undiscounted infinite-horizon optimal control problems of discrete-time deterministic nonlinear systems. We consider approximation errors in the update equations of both value function and control policy. We utilize a new assumption instead of the contraction assumption in discounted optimal control problems. We establish the error bounds for approximate value iteration based on a new error condition. Furthermore, we also establish the error bounds for approximate policy iteration and approximate optimistic policy iteration algorithms. It is shown that the iterative approximate value function can converge to a finite neighborhood of the optimal value function under some conditions. To implement the developed algorithms, critic and action neural networks are used to approximate the value function and control policy, respectively. Finally, a simulation example is given to demonstrate the effectiveness of the developed algorithms.

摘要

在本文中,我们为求解离散时间确定性非线性系统无折扣无限时域最优控制问题,建立了自适应动态规划算法的误差界。我们考虑了在价值函数和控制策略的更新方程中的近似误差。我们利用了一个新的假设,而不是折扣最优控制问题中的收缩假设。我们基于新的误差条件建立了近似值迭代的误差界。此外,我们还建立了近似策略迭代和近似乐观策略迭代算法的误差界。结果表明,在某些条件下,迭代近似值函数可以收敛到最优值函数的一个有限邻域内。为了实现所开发的算法,使用了批评家和动作神经网络分别对价值函数和控制策略进行近似。最后,通过一个仿真例子验证了所开发算法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验