• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自适应动态规划算法求解非折扣最优控制问题的误差界。

Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems.

出版信息

IEEE Trans Neural Netw Learn Syst. 2015 Jun;26(6):1323-34. doi: 10.1109/TNNLS.2015.2402203. Epub 2015 Mar 3.

DOI:10.1109/TNNLS.2015.2402203
PMID:25751878
Abstract

In this paper, we establish error bounds of adaptive dynamic programming algorithms for solving undiscounted infinite-horizon optimal control problems of discrete-time deterministic nonlinear systems. We consider approximation errors in the update equations of both value function and control policy. We utilize a new assumption instead of the contraction assumption in discounted optimal control problems. We establish the error bounds for approximate value iteration based on a new error condition. Furthermore, we also establish the error bounds for approximate policy iteration and approximate optimistic policy iteration algorithms. It is shown that the iterative approximate value function can converge to a finite neighborhood of the optimal value function under some conditions. To implement the developed algorithms, critic and action neural networks are used to approximate the value function and control policy, respectively. Finally, a simulation example is given to demonstrate the effectiveness of the developed algorithms.

摘要

在本文中,我们为求解离散时间确定性非线性系统无折扣无限时域最优控制问题,建立了自适应动态规划算法的误差界。我们考虑了在价值函数和控制策略的更新方程中的近似误差。我们利用了一个新的假设,而不是折扣最优控制问题中的收缩假设。我们基于新的误差条件建立了近似值迭代的误差界。此外,我们还建立了近似策略迭代和近似乐观策略迭代算法的误差界。结果表明,在某些条件下,迭代近似值函数可以收敛到最优值函数的一个有限邻域内。为了实现所开发的算法,使用了批评家和动作神经网络分别对价值函数和控制策略进行近似。最后,通过一个仿真例子验证了所开发算法的有效性。

相似文献

1
Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems.自适应动态规划算法求解非折扣最优控制问题的误差界。
IEEE Trans Neural Netw Learn Syst. 2015 Jun;26(6):1323-34. doi: 10.1109/TNNLS.2015.2402203. Epub 2015 Mar 3.
2
Finite-approximation-error-based discrete-time iterative adaptive dynamic programming.基于有限逼近误差的离散时间迭代自适应动态规划。
IEEE Trans Cybern. 2014 Dec;44(12):2820-33. doi: 10.1109/TCYB.2014.2354377. Epub 2014 Sep 26.
3
Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound.具有ε误差界的离散时间非线性系统有限时域最优控制的自适应动态规划
IEEE Trans Neural Netw. 2011 Jan;22(1):24-36. doi: 10.1109/TNN.2010.2076370. Epub 2010 Sep 27.
4
Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.离散时间非线性系统的策略迭代近似动态规划中的策略近似
IEEE Trans Neural Netw Learn Syst. 2018 Jul;29(7):2794-2807. doi: 10.1109/TNNLS.2017.2702566. Epub 2017 Jun 6.
5
An iterative ϵ-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state.一类具有非固定初始状态的离散时间非线性系统的迭代 ϵ-最优控制方案。
Neural Netw. 2012 Aug;32:236-44. doi: 10.1016/j.neunet.2012.02.027. Epub 2012 Feb 24.
6
Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems.值迭代自适应动态规划在离散时间非线性系统最优控制中的应用。
IEEE Trans Cybern. 2016 Mar;46(3):840-53. doi: 10.1109/TCYB.2015.2492242. Epub 2015 Nov 2.
7
Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems.基于有限逼近误差的离散时间非线性系统最优控制方法。
IEEE Trans Cybern. 2013 Apr;43(2):779-89. doi: 10.1109/TSMCB.2012.2216523. Epub 2013 Mar 7.
8
Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof.使用近似动态规划的离散时间非线性HJB解:收敛性证明
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):943-9. doi: 10.1109/TSMCB.2008.926614.
9
Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming.基于启发式动态规划的一类具有时滞的非线性离散时间系统的最优跟踪控制
IEEE Trans Neural Netw. 2011 Dec;22(12):1851-62. doi: 10.1109/TNN.2011.2172628. Epub 2011 Nov 1.
10
Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices.使用多层感知器神经网络对具有不确定相关转移矩阵的折扣无限期马尔可夫决策过程进行近似鲁棒策略迭代。
IEEE Trans Neural Netw. 2010 Aug;21(8):1270-80. doi: 10.1109/TNN.2010.2050334. Epub 2010 Jul 1.