学习、奖励与决策制定。

Learning, Reward, and Decision Making.

作者信息

O'Doherty John P, Cockburn Jeffrey, Pauli Wolfgang M

机构信息

Division of Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125; email:

出版信息

Annu Rev Psychol. 2017 Jan 3;68:73-100. doi: 10.1146/annurev-psych-010416-044216. Epub 2016 Sep 28.

DOI:10.1146/annurev-psych-010416-044216

PMID:27687119

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6192677/

Abstract

In this review, we summarize findings supporting the existence of multiple behavioral strategies for controlling reward-related behavior, including a dichotomy between the goal-directed or model-based system and the habitual or model-free system in the domain of instrumental conditioning and a similar dichotomy in the realm of Pavlovian conditioning. We evaluate evidence from neuroscience supporting the existence of at least partly distinct neuronal substrates contributing to the key computations necessary for the function of these different control systems. We consider the nature of the interactions between these systems and show how these interactions can lead to either adaptive or maladaptive behavioral outcomes. We then review evidence that an additional system guides inference concerning the hidden states of other agents, such as their beliefs, preferences, and intentions, in a social context. We also describe emerging evidence for an arbitration mechanism between model-based and model-free reinforcement learning, placing such a mechanism within the broader context of the hierarchical control of behavior.

摘要

在本综述中，我们总结了支持存在多种控制奖励相关行为的行为策略的研究结果，包括在工具性条件反射领域中目标导向或基于模型的系统与习惯性或无模型系统之间的二分法，以及在巴甫洛夫条件反射领域中类似的二分法。我们评估了神经科学的证据，这些证据支持存在至少部分不同的神经元基质，这些基质有助于这些不同控制系统功能所需的关键计算。我们考虑了这些系统之间相互作用的性质，并展示了这些相互作用如何导致适应性或适应不良的行为结果。然后，我们回顾了证据，表明另一个系统在社会背景下指导对其他主体隐藏状态的推断，例如他们的信念、偏好和意图。我们还描述了基于模型和无模型强化学习之间仲裁机制的新出现证据，并将这种机制置于行为分层控制的更广泛背景中。

相似文献

Learning, Reward, and Decision Making.

Annu Rev Psychol. 2017 Jan 3;68:73-100. doi: 10.1146/annurev-psych-010416-044216. Epub 2016 Sep 28.

Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates.

Psychol Rev. 2012 Jan;119(1):120-54. doi: 10.1037/a0026435.

Multiple memory systems as substrates for multiple decision systems.

Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.

Navigating complex decision spaces: Problems and paradigms in sequential choice.

Psychol Bull. 2014 Mar;140(2):466-86. doi: 10.1037/a0033455. Epub 2013 Jul 8.

The ubiquity of model-based reinforcement learning.

Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.

Speed/accuracy trade-off between the habitual and the goal-directed processes.

PLoS Comput Biol. 2011 May;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. Epub 2011 May 26.

Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation.

Cogn Affect Behav Neurosci. 2014 Jun;14(2):473-92. doi: 10.3758/s13415-014-0277-8.

The application of computational models to social neuroscience: promises and pitfalls.

Soc Neurosci. 2018 Dec;13(6):637-647. doi: 10.1080/17470919.2018.1518834. Epub 2018 Sep 12.

Multiple Systems for the Motivational Control of Behavior and Associated Neural Substrates in Humans.

Curr Top Behav Neurosci. 2016;27:291-312. doi: 10.1007/7854_2015_386.

Neural basis of reinforcement learning and decision making.

Annu Rev Neurosci. 2012;35:287-308. doi: 10.1146/annurev-neuro-062111-150512. Epub 2012 Mar 29.

引用本文的文献

How working memory and reinforcement learning interact when avoiding punishment and pursuing reward concurrently.

J Exp Psychol Gen. 2025 Sep 1. doi: 10.1037/xge0001817.

Reward Network Activations of Win Versus Loss in a Monetary Gambling Task.

Behav Sci (Basel). 2025 Jul 22;15(8):994. doi: 10.3390/bs15080994.

Persistent representation of a prior schema in the orbitofrontal cortex facilitates learning of a conflicting schema.

bioRxiv. 2025 Mar 1:2025.02.28.640679. doi: 10.1101/2025.02.28.640679.

Investigating working memory updating processes of the human subcortex using 7T MRI.

Elife. 2025 Jun 25;13:RP97874. doi: 10.7554/eLife.97874.

Children Strategically Decide What to Practice.

Child Dev. 2025 Sep-Oct;96(5):1619-1631. doi: 10.1111/cdev.14268. Epub 2025 May 31.

Higher motivation and pleasure scores predict more reliance on model-free decision making.

Cogn Affect Behav Neurosci. 2025 May 22. doi: 10.3758/s13415-025-01302-3.

Transition ability to safe states reduces fear responses to height.

Proc Natl Acad Sci U S A. 2025 May 20;122(20):e2416920122. doi: 10.1073/pnas.2416920122. Epub 2025 May 13.

Nucleus accumbens deep brain stimulation in adult patients suffering from severe and enduring anorexia nervosa (STIMARS): protocol for a pilot study.

Front Psychiatry. 2025 Mar 20;16:1554346. doi: 10.3389/fpsyt.2025.1554346. eCollection 2025.

Using Machine Learning to Determine a Functional Classifier of Retaliation and Its Association With Aggression.

JAACAP Open. 2024 Jun 8;3(1):137-146. doi: 10.1016/j.jaacop.2024.04.007. eCollection 2025 Mar.

Contractions in human cerebellar-cortical manifold structure underlie motor reinforcement learning.

J Neurosci. 2025 Mar 18;45(18). doi: 10.1523/JNEUROSCI.2158-24.2025.

本文引用的文献

Computational psychiatry as a bridge from neuroscience to clinical applications.

Nat Neurosci. 2016 Mar;19(3):404-13. doi: 10.1038/nn.4238.

Physiological state gates acquisition and expression of mesolimbic reward prediction signals.

Proc Natl Acad Sci U S A. 2016 Feb 16;113(7):1943-8. doi: 10.1073/pnas.1519643113. Epub 2016 Feb 1.

Individual variability in behavioral flexibility predicts sign-tracking tendency.

Front Behav Neurosci. 2015 Nov 3;9:289. doi: 10.3389/fnbeh.2015.00289. eCollection 2015.

Distinct Contributions of Ventromedial and Dorsolateral Subregions of the Human Substantia Nigra to Appetitive and Aversive Learning.

J Neurosci. 2015 Oct 21;35(42):14220-33. doi: 10.1523/JNEUROSCI.2277-15.2015.

Habitual control of goal selection in humans.

Proc Natl Acad Sci U S A. 2015 Nov 10;112(45):13817-22. doi: 10.1073/pnas.1506367112. Epub 2015 Oct 12.

Drug Addiction: Updating Actions to Habits to Compulsions Ten Years On.

Annu Rev Psychol. 2016;67:23-50. doi: 10.1146/annurev-psych-122414-033457. Epub 2015 Aug 7.

A Map for Social Navigation in the Human Brain.

Neuron. 2015 Jul 1;87(1):231-43. doi: 10.1016/j.neuron.2015.06.011.

Neural mechanisms underlying human consensus decision-making.

Neuron. 2015 Apr 22;86(2):591-602. doi: 10.1016/j.neuron.2015.03.019. Epub 2015 Apr 9.

Identity-specific coding of future rewards in the human orbitofrontal cortex.

Proc Natl Acad Sci U S A. 2015 Apr 21;112(16):5195-200. doi: 10.1073/pnas.1503550112. Epub 2015 Apr 6.

Model-based choices involve prospective neural activity.

Nat Neurosci. 2015 May;18(5):767-72. doi: 10.1038/nn.3981. Epub 2015 Mar 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

学习、奖励与决策制定。

Learning, Reward, and Decision Making.

作者信息

O'Doherty John P, Cockburn Jeffrey, Pauli Wolfgang M

机构信息

Division of Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125; email:

出版信息

Annu Rev Psychol. 2017 Jan 3;68:73-100. doi: 10.1146/annurev-psych-010416-044216. Epub 2016 Sep 28.

DOI:10.1146/annurev-psych-010416-044216

PMID:27687119

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6192677/

Abstract

摘要

学习、奖励与决策制定。

Learning, Reward, and Decision Making.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

学习、奖励与决策制定。

Learning, Reward, and Decision Making.

作者信息

机构信息

出版信息