离散时间非线性系统的策略迭代近似动态规划中的策略近似

Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.

作者信息

Guo Wentao, Si Jennie, Liu Feng, Mei Shengwei

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Jul;29(7):2794-2807. doi: 10.1109/TNNLS.2017.2702566. Epub 2017 Jun 6.

DOI:10.1109/TNNLS.2017.2702566

Abstract

Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.

摘要

策略迭代近似动态规划（DP）是求解最优决策和控制问题的一种重要算法。在本文中，我们关注使用无限时域无折扣值函数的离散时间非线性系统在策略迭代近似DP中与策略近似相关的问题。考虑到策略近似误差，我们证明了在我们的问题设定下控制策略的渐近稳定性，展示了每个策略迭代步骤中值函数的有界性，并引入了一个新的充分条件，以确保值函数收敛到最优值函数的有界邻域。为了实现近似策略的实际应用，我们考虑使用沃尔泰拉级数，由于其良好的理论性质和在实际应用中的成功，它在控制文献中已有广泛的论述。我们通过几个例子说明了本文所提出主要思想的有效性，其中包括一个水轮发电机励磁控制的实际问题。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

离散时间非线性系统的策略迭代近似动态规划中的策略近似

Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.

作者信息

出版信息

相似文献

引用本文的文献

离散时间非线性系统的策略迭代近似动态规划中的策略近似

Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.

作者信息

出版信息

相似文献

引用本文的文献