• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于在线逼近器的仿射非线性离散时间系统强化学习控制器设计

Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators.

作者信息

Yang Qinmin, Jagannathan Sarangapani

机构信息

State Key Laboratory of Industrial Control Technology, Department of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2012 Apr;42(2):377-90. doi: 10.1109/TSMCB.2011.2166384. Epub 2011 Sep 23.

DOI:10.1109/TSMCB.2011.2166384
PMID:21947529
Abstract

In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.

摘要

本文针对存在有界干扰的一般多输入多输出仿射未知非线性离散时间系统,利用在线逼近器(OLAs)提出了基于强化学习状态反馈和输出反馈的自适应评判器控制器设计方法。所提出的控制器设计有两个部分,一个是设计用于产生最优信号的动作网络,另一个是评估动作网络性能的评判网络。评判器估计代价函数,该函数使用从启发式动态规划导出的递归方程进行在线调整。这里,动作网络和评判网络都使用神经网络(NNs),而任何在线逼近器,如径向基函数、样条函数、模糊逻辑等都可以使用。对于输出反馈情况,额外指定一个神经网络作为观测器来估计不可用的系统状态,因此不需要分离原理。同时利用李雅普诺夫理论推导了控制器方案的神经网络权重调整律,以确保闭环系统的一致最终有界性。最后,在单摆平衡系统和双连杆机器人手臂系统上进行了仿真测试,验证了这两种控制器的有效性。

相似文献

1
Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators.基于在线逼近器的仿射非线性离散时间系统强化学习控制器设计
IEEE Trans Syst Man Cybern B Cybern. 2012 Apr;42(2):377-90. doi: 10.1109/TSMCB.2011.2166384. Epub 2011 Sep 23.
2
Reinforcement-learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation.基于强化学习的复杂非线性离散时间系统双控制方法及其在火花发动机废气再循环操作中的应用
IEEE Trans Neural Netw. 2008 Aug;19(8):1369-88. doi: 10.1109/TNN.2008.2000452.
3
Reinforcement-learning-based output-feedback control of nonstrict nonlinear discrete-time systems with application to engine emission control.基于强化学习的非严格非线性离散时间系统输出反馈控制及其在发动机排放控制中的应用
IEEE Trans Syst Man Cybern B Cybern. 2009 Oct;39(5):1162-79. doi: 10.1109/TSMCB.2009.2013272. Epub 2009 Mar 24.
4
Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks.基于强化学习的线性参数化神经网络对非仿射非线性离散时间系统的控制
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):994-1001. doi: 10.1109/TSMCB.2008.926607.
5
Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints.具有输入约束的非线性离散时间系统的基于强化学习神经网络的控制器
IEEE Trans Syst Man Cybern B Cybern. 2007 Apr;37(2):425-36. doi: 10.1109/tsmcb.2006.883869.
6
Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update.基于时间的策略更新的未知内部动态仿射非线性离散时间系统的在线最优控制
IEEE Trans Neural Netw Learn Syst. 2012 Jul;23(7):1118-29. doi: 10.1109/TNNLS.2012.2196708.
7
Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems.基于强化学习设计的具有较少学习参数的非线性离散时间 MIMO 系统的自适应跟踪控制。
IEEE Trans Neural Netw Learn Syst. 2015 Jan;26(1):165-76. doi: 10.1109/TNNLS.2014.2360724. Epub 2014 Nov 25.
8
Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning.基于强化学习的一类未知非仿射非线性系统的离散时间在线学习控制。
Neural Netw. 2014 Jul;55:30-41. doi: 10.1016/j.neunet.2014.03.008. Epub 2014 Mar 28.
9
A suite of robust controllers for the manipulation of microscale objects.一套用于操纵微观物体的强大控制器。
IEEE Trans Syst Man Cybern B Cybern. 2008 Feb;38(1):113-25. doi: 10.1109/TSMCB.2007.909943.
10
Stochastic optimal controller design for uncertain nonlinear networked control system via neuro dynamic programming.基于神经动态规划的不确定非线性网络控制系统随机最优控制器设计。
IEEE Trans Neural Netw Learn Syst. 2013 Mar;24(3):471-84. doi: 10.1109/TNNLS.2012.2234133.