Suppr超能文献

基于神经网络的路径积分随机最优控制解决方案。

Neural Network-Based Solutions for Stochastic Optimal Control Using Path Integrals.

出版信息

IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):534-545. doi: 10.1109/TNNLS.2016.2544787.

Abstract

In this paper, an offline approximate dynamic programming approach using neural networks is proposed for solving a class of finite horizon stochastic optimal control problems. There are two approaches available in the literature, one based on stochastic maximum principle (SMP) formalism and the other based on solving the stochastic Hamilton-Jacobi-Bellman (HJB) equation. However, in the presence of noise, the SMP formalism becomes complex and results in having to solve a couple of backward stochastic differential equations. Hence, current solution methodologies typically ignore the noise effect. On the other hand, the inclusion of noise in the HJB framework is very straightforward. Furthermore, the stochastic HJB equation of a control-affine nonlinear stochastic system with a quadratic control cost function and an arbitrary state cost function can be formulated as a path integral (PI) problem. However, due to curse of dimensionality, it might not be possible to utilize the PI formulation for obtaining comprehensive solutions over the entire operating domain. A neural network structure called the adaptive critic design paradigm is used to effectively handle this difficulty. In this paper, a novel adaptive critic approach using the PI formulation is proposed for solving stochastic optimal control problems. The potential of the algorithm is demonstrated through simulation results from a couple of benchmark problems.

摘要

本文提出了一种基于神经网络的离线近似动态规划方法,用于求解一类有限时域随机最优控制问题。文献中有两种方法,一种基于随机极大值原理(SMP)形式,另一种基于求解随机 Hamilton-Jacobi-Bellman(HJB)方程。然而,在存在噪声的情况下,SMP 形式变得复杂,导致必须求解几个向后随机微分方程。因此,当前的解决方案方法通常忽略噪声效应。另一方面,在 HJB 框架中包含噪声非常简单。此外,具有二次控制成本函数和任意状态成本函数的控制仿射非线性随机系统的随机 HJB 方程可以表示为路径积分(PI)问题。然而,由于维度诅咒,可能无法利用 PI 公式在整个操作域内获得全面的解决方案。一种称为自适应评论家设计范例的神经网络结构用于有效地处理此困难。本文提出了一种基于 PI 公式的新的自适应评论家方法,用于求解随机最优控制问题。通过几个基准问题的仿真结果证明了该算法的潜力。

相似文献

4
Finite-Time Adaptive Dynamic Programming for Affine-Form Nonlinear Systems.仿射形式非线性系统的有限时间自适应动态规划
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):3573-3586. doi: 10.1109/TNNLS.2023.3337387. Epub 2025 Feb 6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验