Suppr超能文献

一种用于好奇心驱动强化学习的信息论方法。

An information-theoretic approach to curiosity-driven reinforcement learning.

作者信息

Still Susanne, Precup Doina

机构信息

Information and Computer Sciences, University of Hawaii at Mānoa, Honolulu, HI 96822, USA.

出版信息

Theory Biosci. 2012 Sep;131(3):139-48. doi: 10.1007/s12064-011-0142-z. Epub 2012 Jul 12.

Abstract

We provide a fresh look at the problem of exploration in reinforcement learning, drawing on ideas from information theory. First, we show that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Second, we address the problem of curiosity-driven learning. We propose that, in addition to maximizing the expected return, a learner should choose a policy that also maximizes the learner's predictive power. This makes the world both interesting and exploitable. Optimal policies then have the form of Boltzmann-style exploration with a bonus, containing a novel exploration-exploitation trade-off which emerges naturally from the proposed optimization principle. Importantly, this exploration-exploitation trade-off persists in the optimal deterministic policy, i.e., when there is no exploration due to randomness. As a result, exploration is understood as an emerging behavior that optimizes information gain, rather than being modeled as pure randomization of action choices.

摘要

我们借助信息论的思想,对强化学习中的探索问题给出了全新的视角。首先,我们表明,玻尔兹曼式探索作为强化学习中主要的探索方法之一,从信息论的角度来看是最优的,因为它能以最优方式在预期回报和策略的编码成本之间进行权衡。其次,我们探讨了好奇心驱动学习的问题。我们提出,除了最大化预期回报外,学习者还应选择一种能使学习者的预测能力也最大化的策略。这使得世界既有趣又可被利用。最优策略于是具有带奖励的玻尔兹曼式探索的形式,其中包含一种新颖的探索 - 利用权衡,它自然地源于所提出的优化原则。重要的是,这种探索 - 利用权衡在最优确定性策略中依然存在,即当不存在因随机性导致的探索时。结果,探索被理解为一种优化信息增益的新兴行为,而非被建模为动作选择的纯粹随机化。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验