Eppe Manfred, Nguyen Phuong D H, Wermter Stefan
Department of Informatics, Knowledge Technology Institute, Universität Hamburg, Hamburg, Germany.
Front Robot AI. 2019 Nov 26;6:123. doi: 10.3389/frobt.2019.00123. eCollection 2019.
Reinforcement learning is generally accepted to be an appropriate and successful method to learn robot control. Symbolic action planning is useful to resolve causal dependencies and to break a causally complex problem down into a sequence of simpler high-level actions. A problem with the integration of both approaches is that action planning is based on , whereas reinforcement learning is usually driven by a function. Recent advances in model-free reinforcement learning, specifically, universal value function approximators and hindsight experience replay, have focused on goal-independent methods based on that are only given at the end of a rollout, and only if the goal has been fully achieved. In this article, we build on these novel methods to facilitate the integration of action planning with model-free reinforcement learning. Specifically, the paper demonstrates how the reward-sparsity can serve as a bridge between the high-level and low-level state- and action spaces. As a result, we demonstrate that the integrated method is able to solve robotic tasks that involve non-trivial causal dependencies under noisy conditions, exploiting both data and knowledge.
强化学习通常被认为是一种学习机器人控制的合适且成功的方法。符号动作规划有助于解决因果依赖关系,并将因果复杂的问题分解为一系列更简单的高级动作。将这两种方法结合起来的一个问题是,动作规划基于 ,而强化学习通常由一个 函数驱动。无模型强化学习的最新进展,特别是通用价值函数逼近器和事后经验回放,专注于基于 且仅在一次展开结束时(并且只有在目标完全实现时)才给出的与目标无关的方法。在本文中,我们基于这些新方法来促进动作规划与无模型强化学习的整合。具体而言,本文展示了奖励稀疏性如何能够作为高级和低级状态及动作空间之间的桥梁。结果,我们证明了这种集成方法能够在噪声条件下解决涉及非平凡因果依赖关系的机器人任务,同时利用数据和知识。