内存受限部分可观测随机控制中HJB-FP方程组的前后向扫描方法

Forward-Backward Sweep Method for the System of HJB-FP Equations in Memory-Limited Partially Observable Stochastic Control.

作者信息

Tottori Takehiro, Kobayashi Tetsuya J

机构信息

Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-8654, Japan.

Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan.

出版信息

Entropy (Basel). 2023 Jan 21;25(2):208. doi: 10.3390/e25020208.

DOI:10.3390/e25020208

PMID:36832575

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9955073/

Abstract

Memory-limited partially observable stochastic control (ML-POSC) is the stochastic optimal control problem under incomplete information and memory limitation. To obtain the optimal control function of ML-POSC, a system of the forward Fokker-Planck (FP) equation and the backward Hamilton-Jacobi-Bellman (HJB) equation needs to be solved. In this work, we first show that the system of HJB-FP equations can be interpreted via Pontryagin's minimum principle on the probability density function space. Based on this interpretation, we then propose the forward-backward sweep method (FBSM) for ML-POSC. FBSM is one of the most basic algorithms for Pontryagin's minimum principle, which alternately computes the forward FP equation and the backward HJB equation in ML-POSC. Although the convergence of FBSM is generally not guaranteed in deterministic control and mean-field stochastic control, it is guaranteed in ML-POSC because the coupling of the HJB-FP equations is limited to the optimal control function in ML-POSC.

摘要

内存受限部分可观测随机控制（ML-POSC）是不完全信息和内存限制下的随机最优控制问题。为了获得ML-POSC的最优控制函数，需要求解一个由正向福克-普朗克（FP）方程和反向哈密顿-雅可比-贝尔曼（HJB）方程组成的系统。在这项工作中，我们首先表明，HJB-FP方程组可以通过庞特里亚金最小原理在概率密度函数空间上进行解释。基于这种解释，我们随后提出了用于ML-POSC的前向-后向扫描方法（FBSM）。FBSM是庞特里亚金最小原理最基本的算法之一，它在ML-POSC中交替计算正向FP方程和反向HJB方程。尽管在确定性控制和平均场随机控制中，FBSM的收敛性通常无法保证，但在ML-POSC中它是有保证的，因为HJB-FP方程的耦合仅限于ML-POSC中的最优控制函数。