一种用于好奇心驱动强化学习的信息论方法。

An information-theoretic approach to curiosity-driven reinforcement learning.

作者信息

Still Susanne, Precup Doina

机构信息

Information and Computer Sciences, University of Hawaii at Mānoa, Honolulu, HI 96822, USA.

出版信息

Theory Biosci. 2012 Sep;131(3):139-48. doi: 10.1007/s12064-011-0142-z. Epub 2012 Jul 12.

DOI:10.1007/s12064-011-0142-z

PMID:22791268

Abstract

We provide a fresh look at the problem of exploration in reinforcement learning, drawing on ideas from information theory. First, we show that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Second, we address the problem of curiosity-driven learning. We propose that, in addition to maximizing the expected return, a learner should choose a policy that also maximizes the learner's predictive power. This makes the world both interesting and exploitable. Optimal policies then have the form of Boltzmann-style exploration with a bonus, containing a novel exploration-exploitation trade-off which emerges naturally from the proposed optimization principle. Importantly, this exploration-exploitation trade-off persists in the optimal deterministic policy, i.e., when there is no exploration due to randomness. As a result, exploration is understood as an emerging behavior that optimizes information gain, rather than being modeled as pure randomization of action choices.

摘要

我们借助信息论的思想，对强化学习中的探索问题给出了全新的视角。首先，我们表明，玻尔兹曼式探索作为强化学习中主要的探索方法之一，从信息论的角度来看是最优的，因为它能以最优方式在预期回报和策略的编码成本之间进行权衡。其次，我们探讨了好奇心驱动学习的问题。我们提出，除了最大化预期回报外，学习者还应选择一种能使学习者的预测能力也最大化的策略。这使得世界既有趣又可被利用。最优策略于是具有带奖励的玻尔兹曼式探索的形式，其中包含一种新颖的探索 - 利用权衡，它自然地源于所提出的优化原则。重要的是，这种探索 - 利用权衡在最优确定性策略中依然存在，即当不存在因随机性导致的探索时。结果，探索被理解为一种优化信息增益的新兴行为，而非被建模为动作选择的纯粹随机化。

相似文献

An information-theoretic approach to curiosity-driven reinforcement learning.

Theory Biosci. 2012 Sep;131(3):139-48. doi: 10.1007/s12064-011-0142-z. Epub 2012 Jul 12.

Curiosity-driven recommendation strategy for adaptive learning via deep reinforcement learning.

Br J Math Stat Psychol. 2020 Nov;73(3):522-540. doi: 10.1111/bmsp.12199. Epub 2020 Feb 21.

Contributions of expected learning progress and perceptual novelty to curiosity-driven exploration.

Cognition. 2022 Aug;225:105119. doi: 10.1016/j.cognition.2022.105119. Epub 2022 Apr 12.

Computational mechanisms of curiosity and goal-directed exploration.

Elife. 2019 May 10;8:e41703. doi: 10.7554/eLife.41703.

Hierarchical curiosity loops and active sensing.

Neural Netw. 2012 Aug;32:119-29. doi: 10.1016/j.neunet.2012.02.024. Epub 2012 Feb 14.

Curiosity and the dynamics of optimal exploration.

Trends Cogn Sci. 2024 May;28(5):441-453. doi: 10.1016/j.tics.2024.02.001. Epub 2024 Feb 26.

LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.

Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.

Human Variability and the Explore-Exploit Trade-Off in Recommendation.

Cogn Sci. 2023 Apr;47(4):e13279. doi: 10.1111/cogs.13279.

Protection from uncertainty in the exploration/exploitation trade-off.

J Exp Psychol Learn Mem Cogn. 2022 Apr;48(4):547-568. doi: 10.1037/xlm0000883. Epub 2021 Jun 10.

Pupil diameter predicts changes in the exploration-exploitation trade-off: evidence for the adaptive gain theory.

J Cogn Neurosci. 2011 Jul;23(7):1587-96. doi: 10.1162/jocn.2010.21548. Epub 2010 Jul 28.

引用本文的文献

From pixels to planning: scale-free active inference.

Front Netw Physiol. 2025 Jun 18;5:1521963. doi: 10.3389/fnetp.2025.1521963. eCollection 2025.

Towards Human-Like Emergent Communication via Utility, Informativeness, and Complexity.

Open Mind (Camb). 2025 Apr 2;9:418-451. doi: 10.1162/opmi_a_00188. eCollection 2025.

Complex behavior from intrinsic motivation to occupy future action-state path space.

Nat Commun. 2024 Jul 29;15(1):6368. doi: 10.1038/s41467-024-49711-1.

The Reward-Complexity Trade-off in Schizophrenia.

Comput Psychiatr. 2021 May 25;5(1):38-53. doi: 10.5334/cpsy.71. eCollection 2021.

Human decision making balances reward maximization and policy compression.

PLoS Comput Biol. 2024 Apr 26;20(4):e1012057. doi: 10.1371/journal.pcbi.1012057. eCollection 2024 Apr.

Bayesian Reinforcement Learning With Limited Cognitive Load.

Open Mind (Camb). 2024 Apr 3;8:395-438. doi: 10.1162/opmi_a_00132. eCollection 2024.

Federated inference and belief sharing.

Neurosci Biobehav Rev. 2024 Jan;156:105500. doi: 10.1016/j.neubiorev.2023.105500. Epub 2023 Dec 5.

Bibliometric Analysis of Information Theoretic Studies.

Entropy (Basel). 2022 Sep 25;24(10):1359. doi: 10.3390/e24101359.

Rethinking statistical learning as a continuous dynamic stochastic process, from the motor systems perspective.

Front Neurosci. 2022 Nov 8;16:1033776. doi: 10.3389/fnins.2022.1033776. eCollection 2022.

Predictive maps in rats and humans for spatial navigation.

Curr Biol. 2022 Sep 12;32(17):3676-3689.e5. doi: 10.1016/j.cub.2022.06.090. Epub 2022 Jul 20.

本文引用的文献

Efficient computation of optimal actions.

Proc Natl Acad Sci U S A. 2009 Jul 14;106(28):11478-83. doi: 10.1073/pnas.0710743106. Epub 2009 Jul 2.

Reinforcement learning of motor skills with policy gradients.

Neural Netw. 2008 May;21(4):682-97. doi: 10.1016/j.neunet.2008.02.003. Epub 2008 Apr 26.

How many clusters? An information-theoretic perspective.

Neural Comput. 2004 Dec;16(12):2483-506. doi: 10.1162/0899766042321751.

Regularities unseen, randomness observed: levels of entropy convergence.

Chaos. 2003 Mar;13(1):25-54. doi: 10.1063/1.1530990.

Predictability, complexity, and learning.

Neural Comput. 2001 Nov;13(11):2409-63. doi: 10.1162/089976601753195969.

Statistical mechanics and phase transitions in clustering.

Phys Rev Lett. 1990 Aug 20;65(8):945-948. doi: 10.1103/PhysRevLett.65.945.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于好奇心驱动强化学习的信息论方法。

An information-theoretic approach to curiosity-driven reinforcement learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献