基于策略迭代的连续时间非线性最优控制有限时域近似动态规划

Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control.

作者信息

Lin Ziyu, Duan Jingliang, Li Shengbo Eben, Ma Haitong, Li Jie, Chen Jianyu, Cheng Bo, Ma Jun

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5255-5267. doi: 10.1109/TNNLS.2022.3225090. Epub 2023 Sep 1.

DOI:10.1109/TNNLS.2022.3225090

Abstract

The Hamilton-Jacobi-Bellman (HJB) equation serves as the necessary and sufficient condition for the optimal solution to the continuous-time (CT) optimal control problem (OCP). Compared with the infinite-horizon HJB equation, the solving of the finite-horizon (FH) HJB equation has been a long-standing challenge, because the partial time derivative of the value function is involved as an additional unknown term. To address this problem, this study first-time bridges the link between the partial time derivative and the terminal-time utility function, and thus it facilitates the use of the policy iteration (PI) technique to solve the CT FH OCPs. Based on this key finding, the FH approximate dynamic programming (ADP) algorithm is proposed leveraging an actor-critic framework. It is shown that the algorithm exhibits important properties in terms of convergence and optimality. Rather importantly, with the use of multilayer neural networks (NNs) in the actor-critic architecture, the algorithm is suitable for CT FH OCPs toward more general nonlinear and complex systems. Finally, the effectiveness of the proposed algorithm is demonstrated by conducting a series of simulations on both a linear quadratic regulator (LQR) problem and a nonlinear vehicle tracking problem.

摘要

哈密顿 - 雅可比 - 贝尔曼（HJB）方程是连续时间（CT）最优控制问题（OCP）最优解的充要条件。与无限时域HJB方程相比，有限时域（FH）HJB方程的求解一直是一个长期存在的挑战，因为价值函数的偏时间导数作为一个额外的未知项被涉及。为了解决这个问题，本研究首次在偏时间导数和终端时间效用函数之间建立了联系，从而便于使用策略迭代（PI）技术来求解CT FH OCP。基于这一关键发现，利用演员 - 评论家框架提出了FH近似动态规划（ADP）算法。结果表明，该算法在收敛性和最优性方面具有重要性质。更重要的是，通过在演员 - 评论家架构中使用多层神经网络（NN），该算法适用于更一般的非线性和复杂系统的CT FH OCP。最后，通过对线性二次调节器（LQR）问题和非线性车辆跟踪问题进行一系列仿真，验证了所提算法的有效性。

相似文献

Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control.基于策略迭代的连续时间非线性最优控制有限时域近似动态规划

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5255-5267. doi: 10.1109/TNNLS.2022.3225090. Epub 2023 Sep 1.

Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof.使用近似动态规划的离散时间非线性HJB解：收敛性证明

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):943-9. doi: 10.1109/TSMCB.2008.926614.

Continuous-Time Time-Varying Policy Iteration.连续时间时变策略迭代

IEEE Trans Cybern. 2020 Dec;50(12):4958-4971. doi: 10.1109/TCYB.2019.2926631. Epub 2020 Dec 3.

Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors.具有近似误差的哈密顿驱动自适应动态规划

IEEE Trans Cybern. 2022 Dec;52(12):13762-13773. doi: 10.1109/TCYB.2021.3108034. Epub 2022 Nov 18.

A policy iteration approach to online optimal control of continuous-time constrained-input systems.一种连续时间约束输入系统在线最优控制的策略迭代方法。

ISA Trans. 2013 Sep;52(5):611-21. doi: 10.1016/j.isatra.2013.04.004. Epub 2013 May 24.

Dual Heuristic Programming for Optimal Control of Continuous-Time Nonlinear Systems Using Single Echo State Network.基于单回声状态网络的连续时间非线性系统最优控制的对偶启发式规划

IEEE Trans Cybern. 2022 Mar;52(3):1701-1712. doi: 10.1109/TCYB.2020.2984952. Epub 2022 Mar 11.

Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems.策略迭代自适应动态规划算法用于离散时间非线性系统。

IEEE Trans Neural Netw Learn Syst. 2014 Mar;25(3):621-34. doi: 10.1109/TNNLS.2013.2281663.

Approximate Dynamic Programming for Nonlinear-Constrained Optimizations.近似动态规划在非线性约束优化中的应用。

IEEE Trans Cybern. 2021 May;51(5):2419-2432. doi: 10.1109/TCYB.2019.2926248. Epub 2021 Apr 15.

Model-Free Reinforcement Learning for Fully Cooperative Consensus Problem of Nonlinear Multiagent Systems.用于非线性多智能体系统完全协作一致性问题的无模型强化学习

IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1482-1491. doi: 10.1109/TNNLS.2020.3042508. Epub 2022 Apr 4.

Finite-Time Adaptive Dynamic Programming for Affine-Form Nonlinear Systems.仿射形式非线性系统的有限时间自适应动态规划

IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):3573-3586. doi: 10.1109/TNNLS.2023.3337387. Epub 2025 Feb 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于策略迭代的连续时间非线性最优控制有限时域近似动态规划

Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control.

作者信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献