IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2259-2270. doi: 10.1109/TNNLS.2017.2690910. Epub 2017 Apr 17.
The reinforcement learning (RL) paradigm allows agents to solve tasks through trial-and-error learning. To be capable of efficient, long-term learning, RL agents should be able to apply knowledge gained in the past to new tasks they may encounter in the future. The ability to predict actions' consequences may facilitate such knowledge transfer. We consider here domains where an RL agent has access to two kinds of information: agent-centric information with constant semantics across tasks, and environment-centric information, which is necessary to solve the task, but with semantics that differ between tasks. For example, in robot navigation, environment-centric information may include the robot's geographic location, while agent-centric information may include sensor readings of various nearby obstacles. We propose that these situations provide an opportunity for a very natural style of knowledge transfer, in which the agent learns to predict actions' environmental consequences using agent-centric information. These predictions contain important information about the affordances and dangers present in a novel environment, and can effectively transfer knowledge from agent-centric to environment-centric learning systems. Using several example problems including spatial navigation and network routing, we show that our knowledge transfer approach can allow faster and lower cost learning than existing alternatives.
强化学习 (RL) 范式允许代理通过试错学习来解决任务。为了能够进行高效、长期的学习,RL 代理应该能够将过去获得的知识应用于未来可能遇到的新任务。预测行动后果的能力可能有助于这种知识转移。在这里,我们考虑 RL 代理可以访问两种信息的领域:以代理为中心的信息,其语义在任务之间保持不变,以及以环境为中心的信息,这是解决任务所必需的,但在任务之间语义不同。例如,在机器人导航中,以环境为中心的信息可能包括机器人的地理位置,而以代理为中心的信息可能包括附近各种障碍物的传感器读数。我们提出,这些情况为一种非常自然的知识转移方式提供了机会,其中代理学会使用以代理为中心的信息来预测动作对环境的影响。这些预测包含了关于新环境中存在的功能和危险的重要信息,可以有效地将知识从以代理为中心的学习系统转移到以环境为中心的学习系统。我们使用包括空间导航和网络路由在内的几个示例问题,表明我们的知识转移方法可以比现有替代方法更快、成本更低地进行学习。