• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

连续时间非线性系统的价值迭代自适应动态规划收敛性分析

Convergence Analysis of Value Iteration Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems.

作者信息

Xiao Geyang, Zhang Huaguang

出版信息

IEEE Trans Cybern. 2024 Mar;54(3):1639-1649. doi: 10.1109/TCYB.2022.3232599. Epub 2024 Feb 9.

DOI:10.1109/TCYB.2022.3232599
PMID:37018707
Abstract

This article is concerned with the convergence property and error bounds analysis of value iteration (VI) adaptive dynamic programming for continuous-time (CT) nonlinear systems. The size relationship between the total value function and the single integral step cost is described by assuming a contraction assumption. Then, the convergence property of VI is proved while the initial condition is an arbitrary positive semidefinite function. Moreover, the accumulated effects of approximation errors generated in each iteration are taken into consideration while using approximators to implement the algorithm. Based on the contraction assumption, the error bounds condition is proposed, which ensures the approximated iterative results converge to a neighborhood of the optimum, and the relation between the optimal solution and approximated iterative results is also derived. To make the contraction assumption more concrete, an estimation way is proposed to derive a conservative value of the assumption. Finally, three simulation cases are given to validate the theoretical results.

摘要

本文关注连续时间(CT)非线性系统的价值迭代(VI)自适应动态规划的收敛性和误差界分析。通过假设一个收缩假设来描述总值函数与单积分步成本之间的大小关系。然后,证明了当初始条件为任意半正定函数时VI的收敛性。此外,在使用逼近器实现算法时,考虑了每次迭代中产生的逼近误差的累积效应。基于收缩假设,提出了误差界条件,该条件确保逼近的迭代结果收敛到最优值的邻域,并推导了最优解与逼近迭代结果之间的关系。为了使收缩假设更具体,提出了一种估计方法来推导该假设的保守值。最后,给出了三个仿真案例来验证理论结果。

相似文献

1
Convergence Analysis of Value Iteration Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems.连续时间非线性系统的价值迭代自适应动态规划收敛性分析
IEEE Trans Cybern. 2024 Mar;54(3):1639-1649. doi: 10.1109/TCYB.2022.3232599. Epub 2024 Feb 9.
2
Finite-approximation-error-based discrete-time iterative adaptive dynamic programming.基于有限逼近误差的离散时间迭代自适应动态规划。
IEEE Trans Cybern. 2014 Dec;44(12):2820-33. doi: 10.1109/TCYB.2014.2354377. Epub 2014 Sep 26.
3
Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems.自适应动态规划算法求解非折扣最优控制问题的误差界。
IEEE Trans Neural Netw Learn Syst. 2015 Jun;26(6):1323-34. doi: 10.1109/TNNLS.2015.2402203. Epub 2015 Mar 3.
4
Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems.值迭代自适应动态规划在离散时间非线性系统最优控制中的应用。
IEEE Trans Cybern. 2016 Mar;46(3):840-53. doi: 10.1109/TCYB.2015.2492242. Epub 2015 Nov 2.
5
Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems.基于有限逼近误差的离散时间非线性系统最优控制方法。
IEEE Trans Cybern. 2013 Apr;43(2):779-89. doi: 10.1109/TSMCB.2012.2216523. Epub 2013 Mar 7.
6
Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors.具有近似误差的哈密顿驱动自适应动态规划
IEEE Trans Cybern. 2022 Dec;52(12):13762-13773. doi: 10.1109/TCYB.2021.3108034. Epub 2022 Nov 18.
7
Improved value iteration for neural-network-based stochastic optimal control design.基于神经网络的随机最优控制设计的改进价值迭代。
Neural Netw. 2020 Apr;124:280-295. doi: 10.1016/j.neunet.2020.01.004. Epub 2020 Jan 28.
8
Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems.非仿射离散时间非线性系统的无限时域自学习最优控制。
IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):866-79. doi: 10.1109/TNNLS.2015.2401334. Epub 2015 Mar 2.
9
Constrained-Cost Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems.用于离散时间非线性系统最优控制的约束成本自适应动态规划
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3251-3264. doi: 10.1109/TNNLS.2023.3237586. Epub 2024 Feb 29.
10
Continuous-Time Time-Varying Policy Iteration.连续时间时变策略迭代
IEEE Trans Cybern. 2020 Dec;50(12):4958-4971. doi: 10.1109/TCYB.2019.2926631. Epub 2020 Dec 3.

引用本文的文献

1
Event-Trigger Reinforcement Learning-Based Coordinate Control of Modular Unmanned System via Nonzero-Sum Game.基于事件触发强化学习的模块化无人系统非零和博弈坐标控制
Sensors (Basel). 2025 Jan 7;25(2):314. doi: 10.3390/s25020314.
2
Reinforcement learning-based SDN routing scheme empowered by causality detection and GNN.基于因果关系检测和图神经网络的强化学习驱动的软件定义网络路由方案
Front Comput Neurosci. 2024 Apr 29;18:1393025. doi: 10.3389/fncom.2024.1393025. eCollection 2024.