文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Warsaw University of Technology, Institute of Control and Computation Engineering, Poland.

Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.

This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time.

本文考虑了使强化学习适用于现实生活中的控制任务所需的效率和自主性问题。提出了一种实时强化学习算法，该算法使用先前收集的样本反复调整控制策略，并自主估计学习更新的适当步长。该算法基于具有经验回放的演员-评论家，其步长由在线神经网络训练的增强定点算法在线确定。通过对模拟章鱼臂和半猎豹的实验研究，证明了所提出的算法在合理的短时间内以自主方式解决困难的学习控制问题的可行性。

相似文献

Autonomous reinforcement learning with experience replay.自主强化学习与经验回放。

Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.

Real-time reinforcement learning by sequential Actor-Critics and experience replay.基于序贯 Actor-Critic 和经验回放的实时强化学习。

Neural Netw. 2009 Dec;22(10):1484-97. doi: 10.1016/j.neunet.2009.05.011. Epub 2009 May 31.

Reinforcement learning of motor skills with policy gradients.基于策略梯度的运动技能强化学习。

Neural Netw. 2008 May;21(4):682-97. doi: 10.1016/j.neunet.2008.02.003. Epub 2008 Apr 26.

Parameter-exploring policy gradients.参数探索策略梯度。

Neural Netw. 2010 May;23(4):551-9. doi: 10.1016/j.neunet.2009.12.004. Epub 2009 Dec 16.

A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.一种强化学习中用于快速跟踪意外环境变化的参数控制方法。

Biosystems. 2004 Nov;77(1-3):109-17. doi: 10.1016/j.biosystems.2004.05.001.

Robust reinforcement learning control using integral quadratic constraints for recurrent neural networks.基于积分二次约束的递归神经网络的鲁棒强化学习控制

IEEE Trans Neural Netw. 2007 Jul;18(4):993-1002. doi: 10.1109/TNN.2007.899520.

Impedance learning for robotic contact tasks using natural actor-critic algorithm.使用自然演员-评论家算法的机器人接触任务阻抗学习

IEEE Trans Syst Man Cybern B Cybern. 2010 Apr;40(2):433-43. doi: 10.1109/TSMCB.2009.2026289. Epub 2009 Aug 18.

Efficient model learning methods for actor-critic control.用于演员-评论家控制的高效模型学习方法。

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):591-602. doi: 10.1109/TSMCB.2011.2170565. Epub 2011 Dec 7.

Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.基于策略迭代和神经网络的未知约束输入系统自适应最优控制。

IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.

Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method.利用非平稳迭代方法加速强化学习中的策略评估。

IEEE Trans Cybern. 2014 Dec;44(12):2696-705. doi: 10.1109/TCYB.2014.2313655. Epub 2014 Apr 10.

Warsaw University of Technology, Institute of Control and Computation Engineering, Poland.

Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.

相似文献

Autonomous reinforcement learning with experience replay.自主强化学习与经验回放。

Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.

Real-time reinforcement learning by sequential Actor-Critics and experience replay.基于序贯 Actor-Critic 和经验回放的实时强化学习。

Neural Netw. 2009 Dec;22(10):1484-97. doi: 10.1016/j.neunet.2009.05.011. Epub 2009 May 31.

Reinforcement learning of motor skills with policy gradients.基于策略梯度的运动技能强化学习。

Neural Netw. 2008 May;21(4):682-97. doi: 10.1016/j.neunet.2008.02.003. Epub 2008 Apr 26.

Parameter-exploring policy gradients.参数探索策略梯度。

Neural Netw. 2010 May;23(4):551-9. doi: 10.1016/j.neunet.2009.12.004. Epub 2009 Dec 16.

A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.一种强化学习中用于快速跟踪意外环境变化的参数控制方法。

Biosystems. 2004 Nov;77(1-3):109-17. doi: 10.1016/j.biosystems.2004.05.001.

Robust reinforcement learning control using integral quadratic constraints for recurrent neural networks.基于积分二次约束的递归神经网络的鲁棒强化学习控制

IEEE Trans Neural Netw. 2007 Jul;18(4):993-1002. doi: 10.1109/TNN.2007.899520.

Impedance learning for robotic contact tasks using natural actor-critic algorithm.使用自然演员-评论家算法的机器人接触任务阻抗学习

IEEE Trans Syst Man Cybern B Cybern. 2010 Apr;40(2):433-43. doi: 10.1109/TSMCB.2009.2026289. Epub 2009 Aug 18.

Efficient model learning methods for actor-critic control.用于演员-评论家控制的高效模型学习方法。

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):591-602. doi: 10.1109/TSMCB.2011.2170565. Epub 2011 Dec 7.

Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.基于策略迭代和神经网络的未知约束输入系统自适应最优控制。

IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.

Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method.利用非平稳迭代方法加速强化学习中的策略评估。

IEEE Trans Cybern. 2014 Dec;44(12):2696-705. doi: 10.1109/TCYB.2014.2313655. Epub 2014 Apr 10.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

自主强化学习与经验回放。

Autonomous reinforcement learning with experience replay.

机构信息

出版信息

相似文献

自主强化学习与经验回放。

Autonomous reinforcement learning with experience replay.

机构信息

出版信息

相似文献