Suppr超能文献

人类和动物中的强化学习与情景记忆:一个综合框架

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework.

作者信息

Gershman Samuel J, Daw Nathaniel D

机构信息

Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138; email:

Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, New Jersey 08544.

出版信息

Annu Rev Psychol. 2017 Jan 3;68:101-128. doi: 10.1146/annurev-psych-122414-033625. Epub 2016 Sep 2.

Abstract

We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high-dimensional, continuous, and partially observable; this implies that (b) data are relatively sparse and, indeed, precisely the same situation may never be encountered twice; furthermore, (c) rewards depend on the long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, theories of RL have largely involved procedural and semantic memory, the way in which knowledge about action values or world models extracted gradually from many experiences can drive choice. This focus on semantic memory leaves out many aspects of memory, such as episodic memory, related to the traces of individual events. We suggest that these two challenges are related. The computational challenge can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (a) efficiently approximate value functions over complex state spaces, (b) learn with very little data, and (c) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system.

摘要

我们回顾了强化学习(RL)的心理学和神经科学,在过去二十年中,通过对简单学习和决策任务的全面实验研究,强化学习取得了显著进展。然而,强化学习研究中的一个挑战是计算方面的:这些任务的简单性忽略了现实世界中强化学习的重要方面:(a)状态空间是高维的、连续的且部分可观察的;这意味着(b)数据相对稀疏,实际上,完全相同的情况可能永远不会被再次遇到;此外,(c)奖励取决于行动的长期后果,这违反了使强化学习易于处理的经典假设。一个看似不同的挑战是,在认知方面,强化学习理论主要涉及程序性和语义记忆,即从许多经验中逐渐提取的关于行动价值或世界模型的知识能够驱动选择的方式。对语义记忆的这种关注忽略了记忆的许多方面,比如情景记忆,它与个体事件的痕迹有关。我们认为这两个挑战是相关的。计算方面的挑战可以部分地通过赋予强化学习系统情景记忆来解决,这使它们能够(a)在复杂状态空间上有效地近似价值函数,(b)用很少的数据进行学习,以及(c)弥合行动和奖励之间的长期依赖关系。我们回顾了这一建议背后的计算理论以及支持它的经验证据。我们的建议表明,记忆在强化学习中普遍且多样的作用可能是一个综合学习系统的一部分。

相似文献

2
Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。
Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.
5
Nutrient-Sensitive Reinforcement Learning in Monkeys.猴子的营养敏感强化学习。
J Neurosci. 2023 Mar 8;43(10):1714-1730. doi: 10.1523/JNEUROSCI.0752-22.2022. Epub 2023 Jan 20.
6
The ubiquity of model-based reinforcement learning.基于模型的强化学习无处不在。
Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.
7
Beyond dichotomies in reinforcement learning.超越强化学习中的二分法。
Nat Rev Neurosci. 2020 Oct;21(10):576-586. doi: 10.1038/s41583-020-0355-6. Epub 2020 Sep 1.
8
Asymmetric and adaptive reward coding via normalized reinforcement learning.通过归一化强化学习进行非对称和自适应奖励编码。
PLoS Comput Biol. 2022 Jul 21;18(7):e1010350. doi: 10.1371/journal.pcbi.1010350. eCollection 2022 Jul.
10
Episodic memories predict adaptive value-based decision-making.情景记忆预测基于适应性价值的决策。
J Exp Psychol Gen. 2016 May;145(5):548-558. doi: 10.1037/xge0000158. Epub 2016 Mar 21.

引用本文的文献

10
Discounting Future Reward in an Uncertain World.在一个不确定的世界中对未来奖励进行贴现。
Decision (Wash D C ). 2024 Apr;11(2):255-282. doi: 10.1037/dec0000219. Epub 2023 Jun 29.

本文引用的文献

3
Episodic memories predict adaptive value-based decision-making.情景记忆预测基于适应性价值的决策。
J Exp Psychol Gen. 2016 May;145(5):548-558. doi: 10.1037/xge0000158. Epub 2016 Mar 21.
6
Habitual control of goal selection in humans.人类目标选择的习惯性控制。
Proc Natl Acad Sci U S A. 2015 Nov 10;112(45):13817-22. doi: 10.1073/pnas.1506367112. Epub 2015 Oct 12.
8
Evidence integration in model-based tree search.基于模型的树搜索中的证据整合。
Proc Natl Acad Sci U S A. 2015 Sep 15;112(37):11708-13. doi: 10.1073/pnas.1505483112. Epub 2015 Aug 31.
10
Novelty and Inductive Generalization in Human Reinforcement Learning.人类强化学习中的新颖性与归纳概括
Top Cogn Sci. 2015 Jul;7(3):391-415. doi: 10.1111/tops.12138. Epub 2015 Mar 23.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验