• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

近似动态规划:结合区域和局部状态跟踪逼近。

Approximate Dynamic Programming: Combining Regional and Local State Following Approximations.

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2154-2166. doi: 10.1109/TNNLS.2018.2808102.

DOI:10.1109/TNNLS.2018.2808102
PMID:29771668
Abstract

An infinite-horizon optimal regulation problem for a control-affine deterministic system is solved online using a local state following (StaF) kernel and a regional model-based reinforcement learning (R-MBRL) method to approximate the value function. Unlike traditional methods such as R-MBRL that aim to approximate the value function over a large compact set, the StaF kernel approach aims to approximate the value function in a local neighborhood of the state that travels within a compact set. In this paper, the value function is approximated using a state-dependent convex combination of the StaF-based and the R-MBRL-based approximations. As the state enters a neighborhood containing the origin, the value function transitions from being approximated by the StaF approach to the R-MBRL approach. Semiglobal uniformly ultimately bounded (SGUUB) convergence of the system states to the origin is established using a Lyapunov-based analysis. Simulation results are provided for two, three, six, and ten-state dynamical systems to demonstrate the scalability and performance of the developed method.

摘要

使用局部状态跟踪(StaF)核和基于区域的强化学习(R-MBRL)方法,在线解决了一个控制仿射确定性系统的无限时域最优调节问题,以逼近值函数。与传统的 R-MBRL 等方法不同,后者旨在在紧凑集上逼近值函数,StaF 核方法旨在在紧凑集内的状态局部邻域内逼近值函数。在本文中,使用 StaF 基于和 R-MBRL 基于的逼近的状态相关凸组合来逼近值函数。随着状态进入包含原点的邻域,值函数从 StaF 方法过渡到 R-MBRL 方法。通过基于 Lyapunov 的分析,建立了系统状态到原点的半全局一致最终有界(SGUUB)收敛性。为两个、三个、六个和十个状态动力学系统提供了仿真结果,以展示所开发方法的可扩展性和性能。

相似文献

1
Approximate Dynamic Programming: Combining Regional and Local State Following Approximations.近似动态规划:结合区域和局部状态跟踪逼近。
IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2154-2166. doi: 10.1109/TNNLS.2018.2808102.
2
Model-Based Reinforcement Learning for Infinite-Horizon Approximate Optimal Tracking.基于模型的强化学习在无限时域近似最优跟踪中的应用。
IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):753-758. doi: 10.1109/TNNLS.2015.2511658. Epub 2016 Feb 3.
3
Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update.基于时间的策略更新的未知内部动态仿射非线性离散时间系统的在线最优控制
IEEE Trans Neural Netw Learn Syst. 2012 Jul;23(7):1118-29. doi: 10.1109/TNNLS.2012.2196708.
4
The State Following Approximation Method.状态跟随近似方法。
IEEE Trans Neural Netw Learn Syst. 2019 Jun;30(6):1716-1730. doi: 10.1109/TNNLS.2018.2870040. Epub 2018 Oct 25.
5
Event-Triggered Distributed Control of Nonlinear Interconnected Systems Using Online Reinforcement Learning With Exploration.基于在线强化学习与探索的非线性互联系统事件触发分布式控制
IEEE Trans Cybern. 2018 Sep;48(9):2510-2519. doi: 10.1109/TCYB.2017.2741342. Epub 2017 Sep 7.
6
Robust Neuro-Optimal Control of Underactuated Snake Robots With Experience Replay.具有经验回放的欠驱动蛇形机器人的鲁棒神经最优控制。
IEEE Trans Neural Netw Learn Syst. 2018 Jan;29(1):208-217. doi: 10.1109/TNNLS.2017.2768820.
7
Neural network-based finite horizon stochastic optimal control design for nonlinear networked control systems.基于神经网络的非线性网络控制系统有限时域随机最优控制设计。
IEEE Trans Neural Netw Learn Syst. 2015 Mar;26(3):472-85. doi: 10.1109/TNNLS.2014.2315622.
8
Revisiting approximate dynamic programming and its convergence.重温近似动态规划及其收敛性。
IEEE Trans Cybern. 2014 Dec;44(12):2733-43. doi: 10.1109/TCYB.2014.2314612. Epub 2014 May 16.
9
F -Discrepancy for Efficient Sampling in Approximate Dynamic Programming.F-差异在近似动态规划中的高效采样。
IEEE Trans Cybern. 2016 Jul;46(7):1628-39. doi: 10.1109/TCYB.2015.2453123. Epub 2015 Jul 29.
10
Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems.自适应动态规划算法求解非折扣最优控制问题的误差界。
IEEE Trans Neural Netw Learn Syst. 2015 Jun;26(6):1323-34. doi: 10.1109/TNNLS.2015.2402203. Epub 2015 Mar 3.

引用本文的文献

1
Trajectory Tracking within a Hierarchical Primitive-Based Learning Approach.基于分层基元的学习方法中的轨迹跟踪
Entropy (Basel). 2022 Jun 28;24(7):889. doi: 10.3390/e24070889.