Ohigashi Yu, Omori Takashi
Graduate School of Information Science, Hokkaido University, Kita 14 Jyou Nishi 9 Chome, Kita, Sapporo, Hokkaido, Japan.
Neural Netw. 2006 Oct;19(8):1169-80. doi: 10.1016/j.neunet.2006.05.037. Epub 2006 Sep 20.
Traditional reinforcement learning (RL) supposes a complex but single task to be solved. When a RL agent faces a task similar to a learned one, the agent must re-learn the task from the beginning because it doesn't reuse the past learned results. This is the problem of quick action learning, which is the foundation of decision making in the real world. In this paper, we suppose agents that can solve a set of tasks similar to each other in a multiple tasks environment, where we encounter various problems one after another, and propose a technique of action learning that can quickly solve similar tasks by reusing previously learned knowledge. In our method, a model-based RL uses a task model constructed by combining primitive local predictors for predicting task and environmental dynamics. To evaluate the proposed method, we performed a computer simulation using a simple ping-pong game with variations.
传统强化学习(RL)假定要解决一个复杂但单一的任务。当一个RL智能体面对一个与已学任务相似的任务时,该智能体必须从头重新学习这个任务,因为它不会复用过去学到的结果。这就是快速行动学习的问题,它是现实世界中决策的基础。在本文中,我们假定智能体能够在多任务环境中解决一组彼此相似的任务,在这种环境中我们会相继遇到各种问题,并提出一种行动学习技术,该技术可以通过复用先前学到的知识来快速解决相似任务。在我们的方法中,基于模型的RL使用通过组合用于预测任务和环境动态的原始局部预测器构建的任务模型。为了评估所提出的方法,我们使用一个带有变体的简单乒乓球游戏进行了计算机模拟。