Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138
J Neurosci. 2018 Aug 15;38(33):7193-7200. doi: 10.1523/JNEUROSCI.0151-18.2018. Epub 2018 Jul 13.
Reinforcement learning is the process by which an agent learns to predict long-term future reward. We now understand a great deal about the brain's reinforcement learning algorithms, but we know considerably less about the representations of states and actions over which these algorithms operate. A useful starting point is asking what kinds of representations we would want the brain to have, given the constraints on its computational architecture. Following this logic leads to the idea of the successor representation, which encodes states of the environment in terms of their predictive relationships with other states. Recent behavioral and neural studies have provided evidence for the successor representation, and computational studies have explored ways to extend the original idea. This paper reviews progress on these fronts, organizing them within a broader framework for understanding how the brain negotiates tradeoffs between efficiency and flexibility for reinforcement learning.
强化学习是指代理学习预测长期未来奖励的过程。我们现在对大脑的强化学习算法有了很多了解,但对这些算法所操作的状态和动作表示了解得要少得多。一个有用的起点是,在考虑到其计算架构的约束的情况下,询问我们希望大脑具有什么样的表示。遵循这一逻辑会产生后继表示的想法,即根据与其他状态的预测关系来对环境状态进行编码。最近的行为和神经科学研究为后继表示提供了证据,计算研究也探索了扩展原始想法的方法。本文综述了这些方面的进展,将它们组织在一个更广泛的框架内,以了解大脑如何在强化学习的效率和灵活性之间进行权衡。