基于强化学习的时空系统估计

Reinforcement learning-based estimation for spatio-temporal systems.

作者信息

Mowlavi Saviz, Benosman Mouhacine

机构信息

Mitsubishi Electric Research Laboratories, Cambridge, MA, 02139, USA.

出版信息

Sci Rep. 2024 Sep 28;14(1):22464. doi: 10.1038/s41598-024-72055-1.

DOI:10.1038/s41598-024-72055-1

PMID:39341856

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11439076/

Abstract

State estimators such as Kalman filters compute an estimate of the instantaneous state of a dynamical system from sparse sensor measurements. For spatio-temporal systems, whose dynamics are governed by partial differential equations (PDEs), state estimators are typically designed based on a reduced-order model (ROM) that projects the original high-dimensional PDE onto a computationally tractable low-dimensional space. However, ROMs are prone to large errors, which negatively affects the performance of the estimator. Here, we introduce the reinforcement learning reduced-order estimator (RL-ROE), a ROM-based estimator in which the correction term that takes in the measurements is given by a nonlinear policy trained through reinforcement learning. The nonlinearity of the policy enables the RL-ROE to compensate efficiently for errors of the ROM, while still taking advantage of the imperfect knowledge of the dynamics. Using examples involving the Burgers and Navier-Stokes equations with parametric uncertainties, we show that in the limit of very few sensors, the trained RL-ROE outperforms a Kalman filter designed using the same ROM and yields accurate instantaneous estimates of high-dimensional states corresponding to unknown initial conditions and physical parameter values. The RL-ROE opens the door to lightweight real-time sensing of systems governed by parametric PDEs.

摘要

诸如卡尔曼滤波器之类的状态估计器可根据稀疏的传感器测量值来计算动态系统的瞬时状态估计值。对于其动力学由偏微分方程（PDE）控制的时空系统，状态估计器通常基于降阶模型（ROM）进行设计，该模型将原始的高维PDE投影到计算上易于处理的低维空间。然而，ROM容易出现较大误差，这会对估计器的性能产生负面影响。在此，我们引入强化学习降阶估计器（RL-ROE），这是一种基于ROM的估计器，其中接收测量值的校正项由通过强化学习训练的非线性策略给出。该策略的非线性使RL-ROE能够有效地补偿ROM的误差，同时仍能利用对动力学的不完美了解。通过涉及具有参数不确定性的伯格斯方程和纳维-斯托克斯方程的示例，我们表明，在传感器数量极少的情况下，经过训练的RL-ROE优于使用相同ROM设计的卡尔曼滤波器，并能对与未知初始条件和物理参数值相对应的高维状态产生准确的瞬时估计。RL-ROE为受参数PDE控制的系统的轻量级实时传感打开了大门。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于强化学习的时空系统估计

Reinforcement learning-based estimation for spatio-temporal systems.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

基于强化学习的时空系统估计

Reinforcement learning-based estimation for spatio-temporal systems.

作者信息

机构信息

出版信息

相似文献

本文引用的文献