• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 actor-critic 的部分未知非线性离散时间系统最优跟踪。

Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems.

出版信息

IEEE Trans Neural Netw Learn Syst. 2015 Jan;26(1):140-51. doi: 10.1109/TNNLS.2014.2358227. Epub 2014 Oct 8.

DOI:10.1109/TNNLS.2014.2358227
PMID:25312944
Abstract

This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input constraints. The tracking error dynamics and reference trajectory dynamics are first combined to form an augmented system. Then, a new discounted performance function based on the augmented system is presented for the optimal nonlinear tracking problem. In contrast to the standard solution, which finds the feedforward and feedback terms of the control input separately, the minimization of the proposed discounted performance function gives both feedback and feedforward parts of the control input simultaneously. This enables us to encode the input constraints into the optimization problem using a nonquadratic performance function. The DT tracking Bellman equation and tracking Hamilton-Jacobi-Bellman (HJB) are derived. An actor-critic-based reinforcement learning algorithm is used to learn the solution to the tracking HJB equation online without requiring knowledge of the system drift dynamics. That is, two neural networks (NNs), namely, actor NN and critic NN, are tuned online and simultaneously to generate the optimal bounded control policy. A simulation example is given to show the effectiveness of the proposed method.

摘要

本文提出了一种部分无模型自适应最优控制解决方案,用于解决存在输入约束的确定性非线性离散时间 (DT) 跟踪控制问题。首先,将跟踪误差动力学和参考轨迹动力学组合成一个增广系统。然后,针对最优非线性跟踪问题,提出了一种新的基于增广系统的折扣性能函数。与标准解决方案不同,后者分别找到控制输入的前馈和反馈项,所提出的折扣性能函数的最小化同时给出了控制输入的反馈和前馈部分。这使得我们能够使用非二次性能函数将输入约束编码到优化问题中。推导出了 DT 跟踪 Bellman 方程和跟踪 Hamilton-Jacobi-Bellman (HJB) 方程。基于演员-评论家的强化学习算法用于在线学习跟踪 HJB 方程的解,而无需了解系统漂移动力学。也就是说,两个神经网络(NN),即演员 NN 和评论家 NN,在线同时进行调整,以生成最优有界控制策略。给出了一个仿真示例,以显示所提出方法的有效性。

相似文献

1
Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems.基于 actor-critic 的部分未知非线性离散时间系统最优跟踪。
IEEE Trans Neural Netw Learn Syst. 2015 Jan;26(1):140-51. doi: 10.1109/TNNLS.2014.2358227. Epub 2014 Oct 8.
2
A policy iteration approach to online optimal control of continuous-time constrained-input systems.一种连续时间约束输入系统在线最优控制的策略迭代方法。
ISA Trans. 2013 Sep;52(5):611-21. doi: 10.1016/j.isatra.2013.04.004. Epub 2013 May 24.
3
Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.基于策略迭代和神经网络的未知约束输入系统自适应最优控制。
IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.
4
Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems.用于未知仿射非线性离散时间系统 H∞状态反馈控制的在线自适应策略学习算法。
IEEE Trans Cybern. 2014 Dec;44(12):2706-18. doi: 10.1109/TCYB.2014.2313915. Epub 2014 Jul 28.
5
Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks.基于强化学习的线性参数化神经网络对非仿射非线性离散时间系统的控制
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):994-1001. doi: 10.1109/TSMCB.2008.926607.
6
Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update.基于时间的策略更新的未知内部动态仿射非线性离散时间系统的在线最优控制
IEEE Trans Neural Netw Learn Syst. 2012 Jul;23(7):1118-29. doi: 10.1109/TNNLS.2012.2196708.
7
Adaptive optimal trajectory tracking control of AUVs based on reinforcement learning.基于强化学习的 AUV 自适应最优轨迹跟踪控制。
ISA Trans. 2023 Jun;137:122-132. doi: 10.1016/j.isatra.2022.12.003. Epub 2022 Dec 8.
8
Neural network-based finite-horizon optimal control of uncertain affine nonlinear discrete-time systems.基于神经网络的不确定仿射非线性离散时间系统有限时域最优控制。
IEEE Trans Neural Netw Learn Syst. 2015 Mar;26(3):486-99. doi: 10.1109/TNNLS.2014.2315646.
9
Adaptive near-optimal neuro controller for continuous-time nonaffine nonlinear systems with constrained input.具有约束输入的连续时间非仿射非线性系统的自适应近最优神经控制器
Neural Netw. 2017 Sep;93:195-204. doi: 10.1016/j.neunet.2017.05.013. Epub 2017 Jun 21.
10
Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof.使用近似动态规划的离散时间非线性HJB解:收敛性证明
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):943-9. doi: 10.1109/TSMCB.2008.926614.

引用本文的文献

1
Application of reinforcement learning for effective vaccination strategies of coronavirus disease 2019 (COVID-19).强化学习在2019冠状病毒病(COVID-19)有效疫苗接种策略中的应用。
Eur Phys J Plus. 2021;136(5):609. doi: 10.1140/epjp/s13360-021-01620-8. Epub 2021 May 31.