• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于具有参数不确定性系统的基于安全模型的强化学习

Safe Model-Based Reinforcement Learning for Systems With Parametric Uncertainties.

作者信息

Mahmud S M Nahid, Nivison Scott A, Bell Zachary I, Kamalapurkar Rushikesh

机构信息

School of Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, OK, United States.

Munitions Directorate, Air Force Research Laboratory, Eglin AFB, FL, United States.

出版信息

Front Robot AI. 2021 Dec 16;8:733104. doi: 10.3389/frobt.2021.733104. eCollection 2021.

DOI:10.3389/frobt.2021.733104
PMID:34977161
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8717089/
Abstract

Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems.

摘要

在过去十年中,强化学习已成为为动态系统寻找最优控制策略的有效工具,近期的重点是确保在学习和/或执行阶段安全的方法。一般来说,当系统对安全至关重要且/或任务重启在实际中不可行时,安全保证在强化学习中至关重要。在最优控制理论中,安全要求通常根据状态和/或控制约束来表达。近年来,依赖持续激励的强化学习方法已与障碍变换相结合,以在状态约束下学习最优控制策略。为了放宽激励要求,依赖精确模型知识的基于模型的强化学习方法也已与障碍变换框架相结合。本文的目标是为具有模型参数不确定性的确定性非线性系统开发安全强化学习方法,以在不依赖严格激励条件的情况下学习近似约束最优策略。为此,本文开发了一种基于模型的强化学习技术,该技术利用一种新颖的滤波并发学习方法以及障碍变换,以实现对安全关键系统未知模型参数和近似最优状态约束控制策略的同时学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/df57d5cc41e2/frobt-08-733104-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/bf1bbdc55466/frobt-08-733104-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/8787fd692fd4/frobt-08-733104-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/3e3cd10cdc31/frobt-08-733104-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/6a979555419a/frobt-08-733104-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/204e80bf02a8/frobt-08-733104-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/1ffd8c6a9202/frobt-08-733104-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/c4a03f0c3909/frobt-08-733104-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/8c8cbff71d0e/frobt-08-733104-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/df57d5cc41e2/frobt-08-733104-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/bf1bbdc55466/frobt-08-733104-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/8787fd692fd4/frobt-08-733104-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/3e3cd10cdc31/frobt-08-733104-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/6a979555419a/frobt-08-733104-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/204e80bf02a8/frobt-08-733104-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/1ffd8c6a9202/frobt-08-733104-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/c4a03f0c3909/frobt-08-733104-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/8c8cbff71d0e/frobt-08-733104-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2557/8717089/df57d5cc41e2/frobt-08-733104-g009.jpg

相似文献

1
Safe Model-Based Reinforcement Learning for Systems With Parametric Uncertainties.用于具有参数不确定性系统的基于安全模型的强化学习
Front Robot AI. 2021 Dec 16;8:733104. doi: 10.3389/frobt.2021.733104. eCollection 2021.
2
Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems.基于强化学习的约束互联非线性安全关键系统的分散式安全控制
Entropy (Basel). 2023 Aug 2;25(8):1158. doi: 10.3390/e25081158.
3
Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.基于安全强化学习的约束离散时间非线性系统最优控制
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):854-865. doi: 10.1109/TNNLS.2023.3326397. Epub 2025 Jan 7.
4
Reinforcement learning based adaptive optimal control for constrained nonlinear system via a novel state-dependent transformation.基于强化学习的约束非线性系统自适应最优控制:一种新型状态依赖变换方法
ISA Trans. 2023 Feb;133:29-41. doi: 10.1016/j.isatra.2022.07.006. Epub 2022 Jul 12.
5
Safe Reinforcement Learning With Dual Robustness.具有双重稳健性的安全强化学习
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10876-10890. doi: 10.1109/TPAMI.2024.3443916. Epub 2024 Nov 6.
6
Safe Autonomous Driving with Latent Dynamics and State-Wise Constraints.基于潜在动力学和状态约束的安全自动驾驶
Sensors (Basel). 2024 May 15;24(10):3139. doi: 10.3390/s24103139.
7
Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety Cages.基于虚拟安全笼的自动驾驶高速公路弱监督强化学习
Sensors (Basel). 2021 Mar 13;21(6):2032. doi: 10.3390/s21062032.
8
Reinforcement learning-based consensus control for MASs with intermittent constraints.基于强化学习的具有间歇约束的 MASs 一致性控制。
Neural Netw. 2024 Apr;172:106105. doi: 10.1016/j.neunet.2024.106105. Epub 2024 Jan 6.
9
Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints.基于强化学习的输入受限连续时间不确定非线性系统鲁棒控制器设计。
IEEE Trans Cybern. 2015 Jul;45(7):1372-85. doi: 10.1109/TCYB.2015.2417170. Epub 2015 Apr 9.
10
Safe deep reinforcement learning in diesel engine emission control.柴油发动机排放控制中的安全深度强化学习
Proc Inst Mech Eng Part I J Syst Control Eng. 2023 Sep;237(8):1440-1453. doi: 10.1177/09596518231153445. Epub 2023 Feb 17.

本文引用的文献

1
Model-Based Reinforcement Learning for Infinite-Horizon Approximate Optimal Tracking.基于模型的强化学习在无限时域近似最优跟踪中的应用。
IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):753-758. doi: 10.1109/TNNLS.2015.2511658. Epub 2016 Feb 3.
2
Asymptotically Stable Adaptive-Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation.渐近稳定自适应最优控制算法,具有饱和执行器和放宽的激励持续时间。
IEEE Trans Neural Netw Learn Syst. 2016 Nov;27(11):2386-2398. doi: 10.1109/TNNLS.2015.2487972. Epub 2015 Oct 26.
3
Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.
基于策略迭代和神经网络的未知约束输入系统自适应最优控制。
IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.
4
Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems.策略迭代自适应动态规划算法用于离散时间非线性系统。
IEEE Trans Neural Netw Learn Syst. 2014 Mar;25(3):621-34. doi: 10.1109/TNNLS.2013.2281663.
5
Reinforcement learning in continuous time and space.连续时间与空间中的强化学习。
Neural Comput. 2000 Jan;12(1):219-45. doi: 10.1162/089976600300015961.