Suppr超能文献

作为强化学习中知识迁移方法的学习预测结果。

Learning to Predict Consequences as a Method of Knowledge Transfer in Reinforcement Learning.

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2259-2270. doi: 10.1109/TNNLS.2017.2690910. Epub 2017 Apr 17.

Abstract

The reinforcement learning (RL) paradigm allows agents to solve tasks through trial-and-error learning. To be capable of efficient, long-term learning, RL agents should be able to apply knowledge gained in the past to new tasks they may encounter in the future. The ability to predict actions' consequences may facilitate such knowledge transfer. We consider here domains where an RL agent has access to two kinds of information: agent-centric information with constant semantics across tasks, and environment-centric information, which is necessary to solve the task, but with semantics that differ between tasks. For example, in robot navigation, environment-centric information may include the robot's geographic location, while agent-centric information may include sensor readings of various nearby obstacles. We propose that these situations provide an opportunity for a very natural style of knowledge transfer, in which the agent learns to predict actions' environmental consequences using agent-centric information. These predictions contain important information about the affordances and dangers present in a novel environment, and can effectively transfer knowledge from agent-centric to environment-centric learning systems. Using several example problems including spatial navigation and network routing, we show that our knowledge transfer approach can allow faster and lower cost learning than existing alternatives.

摘要

强化学习 (RL) 范式允许代理通过试错学习来解决任务。为了能够进行高效、长期的学习,RL 代理应该能够将过去获得的知识应用于未来可能遇到的新任务。预测行动后果的能力可能有助于这种知识转移。在这里,我们考虑 RL 代理可以访问两种信息的领域:以代理为中心的信息,其语义在任务之间保持不变,以及以环境为中心的信息,这是解决任务所必需的,但在任务之间语义不同。例如,在机器人导航中,以环境为中心的信息可能包括机器人的地理位置,而以代理为中心的信息可能包括附近各种障碍物的传感器读数。我们提出,这些情况为一种非常自然的知识转移方式提供了机会,其中代理学会使用以代理为中心的信息来预测动作对环境的影响。这些预测包含了关于新环境中存在的功能和危险的重要信息,可以有效地将知识从以代理为中心的学习系统转移到以环境为中心的学习系统。我们使用包括空间导航和网络路由在内的几个示例问题,表明我们的知识转移方法可以比现有替代方法更快、成本更低地进行学习。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验