Suppr超能文献

后继者表象:其计算逻辑与神经基质。

The Successor Representation: Its Computational Logic and Neural Substrates.

机构信息

Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138

出版信息

J Neurosci. 2018 Aug 15;38(33):7193-7200. doi: 10.1523/JNEUROSCI.0151-18.2018. Epub 2018 Jul 13.

Abstract

Reinforcement learning is the process by which an agent learns to predict long-term future reward. We now understand a great deal about the brain's reinforcement learning algorithms, but we know considerably less about the representations of states and actions over which these algorithms operate. A useful starting point is asking what kinds of representations we would want the brain to have, given the constraints on its computational architecture. Following this logic leads to the idea of the successor representation, which encodes states of the environment in terms of their predictive relationships with other states. Recent behavioral and neural studies have provided evidence for the successor representation, and computational studies have explored ways to extend the original idea. This paper reviews progress on these fronts, organizing them within a broader framework for understanding how the brain negotiates tradeoffs between efficiency and flexibility for reinforcement learning.

摘要

强化学习是指代理学习预测长期未来奖励的过程。我们现在对大脑的强化学习算法有了很多了解,但对这些算法所操作的状态和动作表示了解得要少得多。一个有用的起点是,在考虑到其计算架构的约束的情况下,询问我们希望大脑具有什么样的表示。遵循这一逻辑会产生后继表示的想法,即根据与其他状态的预测关系来对环境状态进行编码。最近的行为和神经科学研究为后继表示提供了证据,计算研究也探索了扩展原始想法的方法。本文综述了这些方面的进展,将它们组织在一个更广泛的框架内,以了解大脑如何在强化学习的效率和灵活性之间进行权衡。

相似文献

1
The Successor Representation: Its Computational Logic and Neural Substrates.后继者表象:其计算逻辑与神经基质。
J Neurosci. 2018 Aug 15;38(33):7193-7200. doi: 10.1523/JNEUROSCI.0151-18.2018. Epub 2018 Jul 13.
6
Reward-predictive representations generalize across tasks in reinforcement learning.在强化学习中,奖励预测表示可以跨任务泛化。
PLoS Comput Biol. 2020 Oct 15;16(10):e1008317. doi: 10.1371/journal.pcbi.1008317. eCollection 2020 Oct.
7
Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。
Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.
10
A distributional code for value in dopamine-based reinforcement learning.多巴胺基强化学习中的价值分布代码。
Nature. 2020 Jan;577(7792):671-675. doi: 10.1038/s41586-019-1924-6. Epub 2020 Jan 15.

引用本文的文献

1
Probing for consciousness in machines.探寻机器中的意识。
Front Artif Intell. 2025 Aug 20;8:1610225. doi: 10.3389/frai.2025.1610225. eCollection 2025.
6
Adaptive planning depth in human problem-solving.人类问题解决中的适应性规划深度。
R Soc Open Sci. 2025 Apr 9;12(4):241161. doi: 10.1098/rsos.241161. eCollection 2025 Apr.
7
Devaluing memories of reward: a case for dopamine.贬低奖励记忆:多巴胺的一个实例
Commun Biol. 2025 Feb 3;8(1):161. doi: 10.1038/s42003-024-07440-7.

本文引用的文献

1
The successor representation in human reinforcement learning.人类强化学习中的后继表示
Nat Hum Behav. 2017 Sep;1(9):680-692. doi: 10.1038/s41562-017-0180-8. Epub 2017 Aug 28.
2
Rethinking dopamine as generalized prediction error.重新思考多巴胺作为一般性预测误差。
Proc Biol Sci. 2018 Nov 21;285(1891):20181645. doi: 10.1098/rspb.2018.1645.
3
Belief state representation in the dopamine system.多巴胺系统中的信念状态表示。
Nat Commun. 2018 May 14;9(1):1891. doi: 10.1038/s41467-018-04397-0.
6
The hippocampus as a predictive map.海马体作为一个预测图。
Nat Neurosci. 2017 Nov;20(11):1643-1653. doi: 10.1038/nn.4650. Epub 2017 Oct 2.
7
Dopamine, Inference, and Uncertainty.多巴胺、推理与不确定性。
Neural Comput. 2017 Dec;29(12):3311-3326. doi: 10.1162/neco_a_01023. Epub 2017 Sep 28.
9
Predicting the Past, Remembering the Future.预测过去,铭记未来。
Curr Opin Behav Sci. 2017 Oct;17:7-13. doi: 10.1016/j.cobeha.2017.05.025. Epub 2017 Jun 9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验