• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于强化学习的时空系统估计

Reinforcement learning-based estimation for spatio-temporal systems.

作者信息

Mowlavi Saviz, Benosman Mouhacine

机构信息

Mitsubishi Electric Research Laboratories, Cambridge, MA, 02139, USA.

出版信息

Sci Rep. 2024 Sep 28;14(1):22464. doi: 10.1038/s41598-024-72055-1.

DOI:10.1038/s41598-024-72055-1
PMID:39341856
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11439076/
Abstract

State estimators such as Kalman filters compute an estimate of the instantaneous state of a dynamical system from sparse sensor measurements. For spatio-temporal systems, whose dynamics are governed by partial differential equations (PDEs), state estimators are typically designed based on a reduced-order model (ROM) that projects the original high-dimensional PDE onto a computationally tractable low-dimensional space. However, ROMs are prone to large errors, which negatively affects the performance of the estimator. Here, we introduce the reinforcement learning reduced-order estimator (RL-ROE), a ROM-based estimator in which the correction term that takes in the measurements is given by a nonlinear policy trained through reinforcement learning. The nonlinearity of the policy enables the RL-ROE to compensate efficiently for errors of the ROM, while still taking advantage of the imperfect knowledge of the dynamics. Using examples involving the Burgers and Navier-Stokes equations with parametric uncertainties, we show that in the limit of very few sensors, the trained RL-ROE outperforms a Kalman filter designed using the same ROM and yields accurate instantaneous estimates of high-dimensional states corresponding to unknown initial conditions and physical parameter values. The RL-ROE opens the door to lightweight real-time sensing of systems governed by parametric PDEs.

摘要

诸如卡尔曼滤波器之类的状态估计器可根据稀疏的传感器测量值来计算动态系统的瞬时状态估计值。对于其动力学由偏微分方程(PDE)控制的时空系统,状态估计器通常基于降阶模型(ROM)进行设计,该模型将原始的高维PDE投影到计算上易于处理的低维空间。然而,ROM容易出现较大误差,这会对估计器的性能产生负面影响。在此,我们引入强化学习降阶估计器(RL-ROE),这是一种基于ROM的估计器,其中接收测量值的校正项由通过强化学习训练的非线性策略给出。该策略的非线性使RL-ROE能够有效地补偿ROM的误差,同时仍能利用对动力学的不完美了解。通过涉及具有参数不确定性的伯格斯方程和纳维-斯托克斯方程的示例,我们表明,在传感器数量极少的情况下,经过训练的RL-ROE优于使用相同ROM设计的卡尔曼滤波器,并能对与未知初始条件和物理参数值相对应的高维状态产生准确的瞬时估计。RL-ROE为受参数PDE控制的系统的轻量级实时传感打开了大门。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0025/11439076/7324ddf61601/41598_2024_72055_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0025/11439076/4dc92844eae1/41598_2024_72055_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0025/11439076/9483e55b05ff/41598_2024_72055_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0025/11439076/7af9fad21b1c/41598_2024_72055_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0025/11439076/7324ddf61601/41598_2024_72055_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0025/11439076/4dc92844eae1/41598_2024_72055_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0025/11439076/9483e55b05ff/41598_2024_72055_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0025/11439076/7af9fad21b1c/41598_2024_72055_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0025/11439076/7324ddf61601/41598_2024_72055_Fig4_HTML.jpg

相似文献

1
Reinforcement learning-based estimation for spatio-temporal systems.基于强化学习的时空系统估计
Sci Rep. 2024 Sep 28;14(1):22464. doi: 10.1038/s41598-024-72055-1.
2
State estimator based on an indirect Kalman filter for a hydraulically actuated multibody system.基于间接卡尔曼滤波器的液压驱动多体系统状态估计器。
Multibody Syst Dyn. 2022;54(4):373-398. doi: 10.1007/s11044-022-09814-3. Epub 2022 Feb 22.
3
Learning partial differential equations for biological transport models from noisy spatio-temporal data.从含噪时空数据中学习生物传输模型的偏微分方程。
Proc Math Phys Eng Sci. 2020 Feb;476(2234):20190800. doi: 10.1098/rspa.2019.0800. Epub 2020 Feb 19.
4
Partial Policy-Based Reinforcement Learning for Anatomical Landmark Localization in 3D Medical Images.基于部分策略的强化学习在 3D 医学图像中解剖学地标定位。
IEEE Trans Med Imaging. 2020 Apr;39(4):1245-1255. doi: 10.1109/TMI.2019.2946345. Epub 2019 Oct 9.
5
Control of chaotic systems by deep reinforcement learning.基于深度强化学习的混沌系统控制
Proc Math Phys Eng Sci. 2019 Nov;475(2231):20190351. doi: 10.1098/rspa.2019.0351. Epub 2019 Nov 6.
6
Detection of parametric changes in the Peyrard-Bishop- Dauxois model of DNA using nonlinear Kalman filtering.使用非线性卡尔曼滤波检测佩亚尔德 - 毕晓普 - 多索瓦DNA模型中的参数变化。
J Biol Phys. 2015 Jan;41(1):59-83. doi: 10.1007/s10867-014-9366-8. Epub 2014 Oct 9.
7
Robust Learning-Based Control for Uncertain Nonlinear Systems With Validation on a Soft Robot.基于鲁棒学习的不确定非线性系统控制及其在软体机器人上的验证
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):510-524. doi: 10.1109/TNNLS.2023.3328643. Epub 2025 Jan 7.
8
State estimation of stochastic non-linear hybrid dynamic system using an interacting multiple model algorithm.基于交互多模型算法的随机非线性混合动态系统状态估计。
ISA Trans. 2015 Sep;58:520-32. doi: 10.1016/j.isatra.2015.06.005. Epub 2015 Aug 21.
9
Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation.基于自适应卡尔曼时间差分和后继表示的多智能体强化学习
Sensors (Basel). 2022 Feb 11;22(4):1393. doi: 10.3390/s22041393.
10
Multimodel Kalman filtering for adaptive nonuniformity correction in infrared sensors.用于红外传感器自适应非均匀性校正的多模型卡尔曼滤波
J Opt Soc Am A Opt Image Sci Vis. 2006 Jun;23(6):1282-91. doi: 10.1364/josaa.23.001282.

本文引用的文献

1
Reinforcement learning state estimator.强化学习状态估计器
Neural Comput. 2007 Mar;19(3):730-56. doi: 10.1162/neco.2007.19.3.730.