Warsaw University of Technology, Institute of Control and Computation Engineering, Poland.
Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.
This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time.
本文考虑了使强化学习适用于现实生活中的控制任务所需的效率和自主性问题。提出了一种实时强化学习算法,该算法使用先前收集的样本反复调整控制策略,并自主估计学习更新的适当步长。该算法基于具有经验回放的演员-评论家,其步长由在线神经网络训练的增强定点算法在线确定。通过对模拟章鱼臂和半猎豹的实验研究,证明了所提出的算法在合理的短时间内以自主方式解决困难的学习控制问题的可行性。