Department of Computer Science, The University of Tokyo, Japan.
Department of Brain Robot Interface, ATR Computational Neuroscience Laboratory, Japan.
Neural Netw. 2016 Dec;84:1-16. doi: 10.1016/j.neunet.2016.08.005. Epub 2016 Aug 24.
The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control.
强化学习的目标是学习最优策略,该策略控制智能体以获得最大累积奖励。基于模型的强化学习方法从数据中学习环境的转移模型,然后使用转移模型推导出最优策略。然而,在高维环境中学习准确的转移模型需要大量难以获得的数据。为了克服这一困难,在本文中,我们提出将基于模型的强化学习与最近开发的最小二乘条件熵(LSCE)方法相结合,该方法同时执行转移模型估计和降维。我们还进一步将提出的方法扩展到模仿学习场景。实验结果表明,策略搜索与 LSCE 结合可很好地应用于高维控制任务,包括真实人形机器人控制。