IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):534-545. doi: 10.1109/TNNLS.2016.2544787.
In this paper, an offline approximate dynamic programming approach using neural networks is proposed for solving a class of finite horizon stochastic optimal control problems. There are two approaches available in the literature, one based on stochastic maximum principle (SMP) formalism and the other based on solving the stochastic Hamilton-Jacobi-Bellman (HJB) equation. However, in the presence of noise, the SMP formalism becomes complex and results in having to solve a couple of backward stochastic differential equations. Hence, current solution methodologies typically ignore the noise effect. On the other hand, the inclusion of noise in the HJB framework is very straightforward. Furthermore, the stochastic HJB equation of a control-affine nonlinear stochastic system with a quadratic control cost function and an arbitrary state cost function can be formulated as a path integral (PI) problem. However, due to curse of dimensionality, it might not be possible to utilize the PI formulation for obtaining comprehensive solutions over the entire operating domain. A neural network structure called the adaptive critic design paradigm is used to effectively handle this difficulty. In this paper, a novel adaptive critic approach using the PI formulation is proposed for solving stochastic optimal control problems. The potential of the algorithm is demonstrated through simulation results from a couple of benchmark problems.
本文提出了一种基于神经网络的离线近似动态规划方法,用于求解一类有限时域随机最优控制问题。文献中有两种方法,一种基于随机极大值原理(SMP)形式,另一种基于求解随机 Hamilton-Jacobi-Bellman(HJB)方程。然而,在存在噪声的情况下,SMP 形式变得复杂,导致必须求解几个向后随机微分方程。因此,当前的解决方案方法通常忽略噪声效应。另一方面,在 HJB 框架中包含噪声非常简单。此外,具有二次控制成本函数和任意状态成本函数的控制仿射非线性随机系统的随机 HJB 方程可以表示为路径积分(PI)问题。然而,由于维度诅咒,可能无法利用 PI 公式在整个操作域内获得全面的解决方案。一种称为自适应评论家设计范例的神经网络结构用于有效地处理此困难。本文提出了一种基于 PI 公式的新的自适应评论家方法,用于求解随机最优控制问题。通过几个基准问题的仿真结果证明了该算法的潜力。