最优控制和自适应动态规划中的价值迭代和策略迭代。

Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming.

出版信息

IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):500-509. doi: 10.1109/TNNLS.2015.2503980. Epub 2015 Dec 22.

DOI:10.1109/TNNLS.2015.2503980

Abstract

In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general assumptions, we establish the uniqueness of the solution of Bellman's equation, and we provide convergence results for value and policy iterations.

摘要

在本文中，我们考虑了状态终端集的离散时间无限时域最优控制问题。这些问题通常被作为自适应动态规划的起点。在非常一般的假设下，我们建立了贝尔曼方程解的唯一性，并提供了价值迭代和策略迭代的收敛性结果。

相似文献

Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming.最优控制和自适应动态规划中的价值迭代和策略迭代。

IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):500-509. doi: 10.1109/TNNLS.2015.2503980. Epub 2015 Dec 22.

Revisiting approximate dynamic programming and its convergence.重温近似动态规划及其收敛性。

IEEE Trans Cybern. 2014 Dec;44(12):2733-43. doi: 10.1109/TCYB.2014.2314612. Epub 2014 May 16.

Bellman's GAP--a language and compiler for dynamic programming in sequence analysis.贝尔曼差距--序列分析中动态编程的语言和编译器。

Bioinformatics. 2013 Mar 1;29(5):551-60. doi: 10.1093/bioinformatics/btt022. Epub 2013 Jan 25.

Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems.自适应动态规划算法求解非折扣最优控制问题的误差界。

IEEE Trans Neural Netw Learn Syst. 2015 Jun;26(6):1323-34. doi: 10.1109/TNNLS.2015.2402203. Epub 2015 Mar 3.

Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems.策略迭代自适应动态规划算法用于离散时间非线性系统。

IEEE Trans Neural Netw Learn Syst. 2014 Mar;25(3):621-34. doi: 10.1109/TNNLS.2013.2281663.

Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update.基于时间的策略更新的未知内部动态仿射非线性离散时间系统的在线最优控制

IEEE Trans Neural Netw Learn Syst. 2012 Jul;23(7):1118-29. doi: 10.1109/TNNLS.2012.2196708.

Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games.自适应动态规划在离散时间零和博弈中的应用。

IEEE Trans Neural Netw Learn Syst. 2018 Apr;29(4):957-969. doi: 10.1109/TNNLS.2016.2638863. Epub 2017 Jan 27.

Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems.非仿射离散时间非线性系统的无限时域自学习最优控制。

IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):866-79. doi: 10.1109/TNNLS.2015.2401334. Epub 2015 Mar 2.

Optimistic value based optimal control for uncertain linear singular systems and application to a dynamic input-output model.基于乐观值的不确定线性奇异系统最优控制及其在动态投入产出模型中的应用

ISA Trans. 2017 Nov;71(Pt 2):235-251. doi: 10.1016/j.isatra.2017.08.007. Epub 2017 Aug 31.

Finite-approximation-error-based discrete-time iterative adaptive dynamic programming.基于有限逼近误差的离散时间迭代自适应动态规划。

IEEE Trans Cybern. 2014 Dec;44(12):2820-33. doi: 10.1109/TCYB.2014.2354377. Epub 2014 Sep 26.

引用本文的文献

Optimizing long term disease prevention with reinforcement learning: a framework for precision lipid control.利用强化学习优化长期疾病预防：精准血脂控制框架

NPJ Digit Med. 2025 Aug 27;8(1):553. doi: 10.1038/s41746-025-01951-1.

Moment-Based Reinforcement Learning for Ensemble Control.用于集成控制的基于矩的强化学习

IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12653-12664. doi: 10.1109/TNNLS.2023.3264151. Epub 2024 Sep 4.

The optimal strategy balancing risk and speed predicts DNA damage checkpoint override times.平衡风险与速度的最优策略可预测DNA损伤检查点的越过时间。

Nat Phys. 2022 Jul;18:832-839. doi: 10.1038/s41567-022-01601-3. Epub 2022 May 12.

Uncertainty quantification and optimal decisions.不确定性量化与最优决策。

Proc Math Phys Eng Sci. 2017 Apr;473(2200):20170115. doi: 10.1098/rspa.2017.0115. Epub 2017 Apr 26.

What do molecules do when we are not looking? State sequence analysis for stochastic chemical systems.当我们不观察时，分子在做什么？随机化学系统的状态序列分析。

J R Soc Interface. 2012 Dec 7;9(77):3411-25. doi: 10.1098/rsif.2012.0633. Epub 2012 Sep 12.

A CoD-based stationary control policy for intervening in large gene regulatory networks.基于控制论的大型基因调控网络干预的定态控制策略。

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S10. doi: 10.1186/1471-2105-12-S10-S10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

最优控制和自适应动态规划中的价值迭代和策略迭代。

Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献