• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

离散时间非线性系统的策略迭代近似动态规划中的策略近似

Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.

作者信息

Guo Wentao, Si Jennie, Liu Feng, Mei Shengwei

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Jul;29(7):2794-2807. doi: 10.1109/TNNLS.2017.2702566. Epub 2017 Jun 6.

DOI:10.1109/TNNLS.2017.2702566
PMID:28600262
Abstract

Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.

摘要

策略迭代近似动态规划(DP)是求解最优决策和控制问题的一种重要算法。在本文中,我们关注使用无限时域无折扣值函数的离散时间非线性系统在策略迭代近似DP中与策略近似相关的问题。考虑到策略近似误差,我们证明了在我们的问题设定下控制策略的渐近稳定性,展示了每个策略迭代步骤中值函数的有界性,并引入了一个新的充分条件,以确保值函数收敛到最优值函数的有界邻域。为了实现近似策略的实际应用,我们考虑使用沃尔泰拉级数,由于其良好的理论性质和在实际应用中的成功,它在控制文献中已有广泛的论述。我们通过几个例子说明了本文所提出主要思想的有效性,其中包括一个水轮发电机励磁控制的实际问题。

相似文献

1
Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.离散时间非线性系统的策略迭代近似动态规划中的策略近似
IEEE Trans Neural Netw Learn Syst. 2018 Jul;29(7):2794-2807. doi: 10.1109/TNNLS.2017.2702566. Epub 2017 Jun 6.
2
Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems.自适应动态规划算法求解非折扣最优控制问题的误差界。
IEEE Trans Neural Netw Learn Syst. 2015 Jun;26(6):1323-34. doi: 10.1109/TNNLS.2015.2402203. Epub 2015 Mar 3.
3
Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems.值迭代自适应动态规划在离散时间非线性系统最优控制中的应用。
IEEE Trans Cybern. 2016 Mar;46(3):840-53. doi: 10.1109/TCYB.2015.2492242. Epub 2015 Nov 2.
4
Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems.策略迭代自适应动态规划算法用于离散时间非线性系统。
IEEE Trans Neural Netw Learn Syst. 2014 Mar;25(3):621-34. doi: 10.1109/TNNLS.2013.2281663.
5
Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems.基于有限逼近误差的离散时间非线性系统最优控制方法。
IEEE Trans Cybern. 2013 Apr;43(2):779-89. doi: 10.1109/TSMCB.2012.2216523. Epub 2013 Mar 7.
6
Finite-approximation-error-based discrete-time iterative adaptive dynamic programming.基于有限逼近误差的离散时间迭代自适应动态规划。
IEEE Trans Cybern. 2014 Dec;44(12):2820-33. doi: 10.1109/TCYB.2014.2354377. Epub 2014 Sep 26.
7
Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors.带逼近误差的离散时间稳定广义自学习最优控制。
IEEE Trans Neural Netw Learn Syst. 2018 Apr;29(4):1226-1238. doi: 10.1109/TNNLS.2017.2661865. Epub 2017 Feb 28.
8
Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Admissibility and Termination Analysis.离散时间局部值迭代自适应动态规划:可容许性和终止分析。
IEEE Trans Neural Netw Learn Syst. 2017 Nov;28(11):2490-2502. doi: 10.1109/TNNLS.2016.2593743.
9
Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems.非仿射离散时间非线性系统的无限时域自学习最优控制。
IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):866-79. doi: 10.1109/TNNLS.2015.2401334. Epub 2015 Mar 2.
10
A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm.一种基于贪婪HDP迭代算法的一类离散时间非线性系统的新型无限时间最优跟踪控制方案。
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):937-42. doi: 10.1109/TSMCB.2008.920269.

引用本文的文献

1
Trajectory Tracking within a Hierarchical Primitive-Based Learning Approach.基于分层基元的学习方法中的轨迹跟踪
Entropy (Basel). 2022 Jun 28;24(7):889. doi: 10.3390/e24070889.