• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

连续时间自适应评论家。

Continuous-time adaptive critics.

作者信息

Hanselmann Thomas, Noakes Lyle, Zaknich Anthony

机构信息

Department of Electrical and Electronic Engineering, the University of Melbourne, Parkville, Vic. 3010, Australia.

出版信息

IEEE Trans Neural Netw. 2007 May;18(3):631-47. doi: 10.1109/TNN.2006.889499.

DOI:10.1109/TNN.2006.889499
PMID:17526332
Abstract

A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume.

摘要

研究了自适应评判设计(ACD)的连续时间公式。建立了与离散情况的联系,在离散情况下,时间反向传播(BPTT)和实时递归学习(RTRL)很普遍。实际的好处是,该框架与由微分方程给出的对象描述非常契合,并且任何具有自适应步长的标准积分例程都能免费进行自适应采样。针对一般对象和评判,建立了使用牛顿法的二阶执行器自适应,以实现执行器的快速收敛。此外,还引入了用于并发执行器 - 评判训练的快速评判更新,以便立即应用由执行器更新引起的评判参数的必要调整,从而在执行器变化后使贝尔曼最优性保持在一阶近似正确。因此,评判和执行器更新可以同时进行,直到贝尔曼最优性或时间差分方程中出现一些显著的误差积累,此时需要进行传统的评判训练,然后可以恢复另一个并发执行器 - 评判训练间隔。

相似文献

1
Continuous-time adaptive critics.连续时间自适应评论家。
IEEE Trans Neural Netw. 2007 May;18(3):631-47. doi: 10.1109/TNN.2006.889499.
2
Efficient model learning methods for actor-critic control.用于演员-评论家控制的高效模型学习方法。
IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):591-602. doi: 10.1109/TSMCB.2011.2170565. Epub 2011 Dec 7.
3
Robust adaptive gradient-descent training algorithm for recurrent neural networks in discrete time domain.离散时域递归神经网络的鲁棒自适应梯度下降训练算法
IEEE Trans Neural Netw. 2008 Nov;19(11):1841-53. doi: 10.1109/TNN.2008.2001923.
4
Neural-network-based nonlinear adaptive dynamical decoupling control.基于神经网络的非线性自适应动态解耦控制
IEEE Trans Neural Netw. 2007 May;18(3):921-5. doi: 10.1109/TNN.2007.891588.
5
Deterministic learning and rapid dynamical pattern recognition.确定性学习与快速动态模式识别。
IEEE Trans Neural Netw. 2007 May;18(3):617-30. doi: 10.1109/TNN.2006.889496.
6
Performance of the Bayesian online algorithm for the perceptron.感知器的贝叶斯在线算法性能
IEEE Trans Neural Netw. 2007 May;18(3):902-5. doi: 10.1109/TNN.2007.891189.
7
Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear H∞ control.基于神经网络的在线同时策略更新算法,用于解决非线性 H∞ 控制中的 HJI 方程。
IEEE Trans Neural Netw Learn Syst. 2012 Dec;23(12):1884-95. doi: 10.1109/TNNLS.2012.2217349.
8
Decision feedback recurrent neural equalization with fast convergence rate.具有快速收敛速率的判决反馈递归神经均衡器。
IEEE Trans Neural Netw. 2005 May;16(3):699-708. doi: 10.1109/TNN.2005.845142.
9
Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.基于策略迭代和神经网络的未知约束输入系统自适应最优控制。
IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.
10
Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems.针对部分未知非线性系统的连续时间直接自适应最优控制的神经网络方法。
Neural Netw. 2009 Apr;22(3):237-46. doi: 10.1016/j.neunet.2009.03.008. Epub 2009 Mar 26.

引用本文的文献

1
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning.具有分层模型学习与规划的高效行动者-评论家算法
Comput Intell Neurosci. 2016;2016:4824072. doi: 10.1155/2016/4824072. Epub 2016 Oct 3.