• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自主强化学习与经验回放。

Autonomous reinforcement learning with experience replay.

机构信息

Warsaw University of Technology, Institute of Control and Computation Engineering, Poland.

出版信息

Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.

DOI:10.1016/j.neunet.2012.11.007
PMID:23237972
Abstract

This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time.

摘要

本文考虑了使强化学习适用于现实生活中的控制任务所需的效率和自主性问题。提出了一种实时强化学习算法,该算法使用先前收集的样本反复调整控制策略,并自主估计学习更新的适当步长。该算法基于具有经验回放的演员-评论家,其步长由在线神经网络训练的增强定点算法在线确定。通过对模拟章鱼臂和半猎豹的实验研究,证明了所提出的算法在合理的短时间内以自主方式解决困难的学习控制问题的可行性。

相似文献

1
Autonomous reinforcement learning with experience replay.自主强化学习与经验回放。
Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.
2
Real-time reinforcement learning by sequential Actor-Critics and experience replay.基于序贯 Actor-Critic 和经验回放的实时强化学习。
Neural Netw. 2009 Dec;22(10):1484-97. doi: 10.1016/j.neunet.2009.05.011. Epub 2009 May 31.
3
Reinforcement learning of motor skills with policy gradients.基于策略梯度的运动技能强化学习。
Neural Netw. 2008 May;21(4):682-97. doi: 10.1016/j.neunet.2008.02.003. Epub 2008 Apr 26.
4
Parameter-exploring policy gradients.参数探索策略梯度。
Neural Netw. 2010 May;23(4):551-9. doi: 10.1016/j.neunet.2009.12.004. Epub 2009 Dec 16.
5
A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.一种强化学习中用于快速跟踪意外环境变化的参数控制方法。
Biosystems. 2004 Nov;77(1-3):109-17. doi: 10.1016/j.biosystems.2004.05.001.
6
Robust reinforcement learning control using integral quadratic constraints for recurrent neural networks.基于积分二次约束的递归神经网络的鲁棒强化学习控制
IEEE Trans Neural Netw. 2007 Jul;18(4):993-1002. doi: 10.1109/TNN.2007.899520.
7
Impedance learning for robotic contact tasks using natural actor-critic algorithm.使用自然演员-评论家算法的机器人接触任务阻抗学习
IEEE Trans Syst Man Cybern B Cybern. 2010 Apr;40(2):433-43. doi: 10.1109/TSMCB.2009.2026289. Epub 2009 Aug 18.
8
Efficient model learning methods for actor-critic control.用于演员-评论家控制的高效模型学习方法。
IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):591-602. doi: 10.1109/TSMCB.2011.2170565. Epub 2011 Dec 7.
9
Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.基于策略迭代和神经网络的未知约束输入系统自适应最优控制。
IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.
10
Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method.利用非平稳迭代方法加速强化学习中的策略评估。
IEEE Trans Cybern. 2014 Dec;44(12):2696-705. doi: 10.1109/TCYB.2014.2313655. Epub 2014 Apr 10.