目标导向决策作为概率推理：计算框架和潜在的神经关联。

Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates.

机构信息

Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08540, USA.

出版信息

Psychol Rev. 2012 Jan;119(1):120-54. doi: 10.1037/a0026435.

DOI:10.1037/a0026435

PMID:22229491

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3767755/

Abstract

Recent work has given rise to the view that reward-based decision making is governed by two key controllers: a habit system, which stores stimulus-response associations shaped by past reward, and a goal-oriented system that selects actions based on their anticipated outcomes. The current literature provides a rich body of computational theory addressing habit formation, centering on temporal-difference learning mechanisms. Less progress has been made toward formalizing the processes involved in goal-directed decision making. We draw on recent work in cognitive neuroscience, animal conditioning, cognitive and developmental psychology, and machine learning to outline a new theory of goal-directed decision making. Our basic proposal is that the brain, within an identifiable network of cortical and subcortical structures, implements a probabilistic generative model of reward, and that goal-directed decision making is effected through Bayesian inversion of this model. We present a set of simulations implementing the account, which address benchmark behavioral and neuroscientific findings, and give rise to a set of testable predictions. We also discuss the relationship between the proposed framework and other models of decision making, including recent models of perceptual choice, to which our theory bears a direct connection.

摘要

最近的研究提出了一种观点，即基于奖励的决策是由两个关键控制器来管理的：一个是习惯系统，它存储由过去奖励塑造的刺激-反应关联；另一个是目标导向系统，它根据预期结果选择行动。当前的文献提供了丰富的计算理论来解决习惯形成问题，这些理论主要集中在时间差分学习机制上。在形式化目标导向决策所涉及的过程方面，进展较少。我们借鉴认知神经科学、动物条件反射、认知和发展心理学以及机器学习方面的最新研究成果，概述了一种新的目标导向决策理论。我们的基本观点是，大脑在可识别的皮质和皮质下结构网络中，实现了一个奖励的概率生成模型，而目标导向决策是通过对该模型进行贝叶斯反演来实现的。我们提出了一组模拟实现该理论的方案，这些方案解决了基准行为和神经科学发现，并提出了一系列可测试的预测。我们还讨论了所提出的框架与其他决策模型之间的关系，包括最近的感知选择模型，我们的理论与这些模型直接相关。

相似文献

Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates.目标导向决策作为概率推理：计算框架和潜在的神经关联。

Psychol Rev. 2012 Jan;119(1):120-54. doi: 10.1037/a0026435.

Learning, Reward, and Decision Making.学习、奖励与决策制定。

Annu Rev Psychol. 2017 Jan 3;68:73-100. doi: 10.1146/annurev-psych-010416-044216. Epub 2016 Sep 28.

A new computational account of cognitive control over reinforcement-based decision-making: Modeling of a probabilistic learning task.一种关于对基于强化的决策进行认知控制的新计算解释：概率学习任务的建模

Neural Netw. 2015 Nov;71:112-23. doi: 10.1016/j.neunet.2015.08.006. Epub 2015 Aug 20.

Goal-Directed Decision Making with Spiking Neurons.基于脉冲神经元的目标导向决策

J Neurosci. 2016 Feb 3;36(5):1529-46. doi: 10.1523/JNEUROSCI.2854-15.2016.

Adaptive learning via selectionism and Bayesianism, Part I: connection between the two.基于选择主义和贝叶斯主义的适应性学习，第一部分：两者之间的联系。

Neural Netw. 2009 Apr;22(3):220-8. doi: 10.1016/j.neunet.2009.03.018. Epub 2009 Apr 5.

Reward-modulated Hebbian learning of decision making.奖励调节的决策赫布学习。

Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.

Speed/accuracy trade-off between the habitual and the goal-directed processes.习惯与目标导向过程之间的速度/准确性权衡。

PLoS Comput Biol. 2011 May;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. Epub 2011 May 26.

Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control.小脑基于相关性学习与基底神经节基于奖励学习的神经调节适应性组合，用于目标导向行为控制。

Front Neural Circuits. 2014 Oct 28;8:126. doi: 10.3389/fncir.2014.00126. eCollection 2014.

Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。

Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.

Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。

Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.

引用本文的文献

Adaptive planning depth in human problem-solving.人类问题解决中的适应性规划深度。

R Soc Open Sci. 2025 Apr 9;12(4):241161. doi: 10.1098/rsos.241161. eCollection 2025 Apr.

A Neural Circuit Framework for Economic Choice: From Building Blocks of Valuation to Compositionality in Multitasking.经济选择的神经回路框架：从估值的基本要素到多任务处理中的组合性

bioRxiv. 2025 Mar 13:2025.03.13.643098. doi: 10.1101/2025.03.13.643098.

The affective gradient hypothesis: an affect-centered account of motivated behavior.情感梯度假说：一种以情感为中心的动机行为解释。

Trends Cogn Sci. 2024 Dec;28(12):1089-1104. doi: 10.1016/j.tics.2024.08.003. Epub 2024 Sep 24.

A recurrent network model of planning explains hippocampal replay and human behavior.一种规划的循环网络模型解释了海马体重放和人类行为。

Nat Neurosci. 2024 Jul;27(7):1340-1348. doi: 10.1038/s41593-024-01675-7. Epub 2024 Jun 7.

Cognitive, Emotional, and Daily Functioning Domains Involved in Decision-Making among Patients with Mild Cognitive Impairment: A Systematic Review.轻度认知障碍患者决策过程中涉及的认知、情感和日常功能领域：一项系统综述

Brain Sci. 2024 Mar 14;14(3):278. doi: 10.3390/brainsci14030278.

The construction and use of cognitive maps in model-based control.基于模型控制中的认知图的构建与应用。

J Exp Psychol Gen. 2024 Feb;153(2):372-385. doi: 10.1037/xge0001491. Epub 2023 Dec 7.

Design Principles for Neurorobotics.神经机器人学的设计原则

Front Neurorobot. 2022 May 25;16:882518. doi: 10.3389/fnbot.2022.882518. eCollection 2022.

Reinforcement learning and Bayesian inference provide complementary models for the unique advantage of adolescents in stochastic reversal.强化学习和贝叶斯推断为青少年在随机反转中的独特优势提供了互补的模型。

Dev Cogn Neurosci. 2022 Jun;55:101106. doi: 10.1016/j.dcn.2022.101106. Epub 2022 Apr 22.

Neuronal origins of reduced accuracy and biases in economic choices under sequential offers.序贯报价下经济选择准确性降低和偏差的神经起源。

Elife. 2022 Apr 13;11:e75910. doi: 10.7554/eLife.75910.

Cognitive Control as a Multivariate Optimization Problem.认知控制作为一个多元优化问题。

J Cogn Neurosci. 2022 Mar 5;34(4):569-591. doi: 10.1162/jocn_a_01822.

本文引用的文献

Goal-directed decision making in prefrontal cortex: A computational framework.前额叶皮质中的目标导向决策：一个计算框架。

Adv Neural Inf Process Syst. 2009;21:169-176.

A Bayesian account of reconstructive memory.重构性记忆的贝叶斯解释。

Top Cogn Sci. 2009 Jan;1(1):189-202. doi: 10.1111/j.1756-8765.2008.01010.x.

Reinforcer specificity of the suppression of instrumental performance on a non-contingent schedule.非连续性强化程序下工具性操作抑制的强化物特异性

Behav Processes. 1989 Jun;19(1-3):167-80. doi: 10.1016/0376-6357(89)90039-9.

Neural representation of reward probability: evidence from the illusion of control.奖励概率的神经表示：来自控制错觉的证据。

J Cogn Neurosci. 2013 Jun;25(6):852-61. doi: 10.1162/jocn_a_00369. Epub 2013 Feb 14.

Bayesian Fundamentalism or Enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition.贝叶斯原教旨主义还是启蒙？论认知贝叶斯模型的解释地位和理论贡献。

Behav Brain Sci. 2011 Aug;34(4):169-88; disuccsion 188-231. doi: 10.1017/S0140525X10003134.

A neural signature of hierarchical reinforcement learning.分层强化学习的神经特征。

Neuron. 2011 Jul 28;71(2):370-9. doi: 10.1016/j.neuron.2011.05.042.

Neural correlates of forward planning in a spatial decision task in humans.人类在空间决策任务中进行前瞻性规划的神经关联。

J Neurosci. 2011 Apr 6;31(14):5526-39. doi: 10.1523/JNEUROSCI.4647-10.2011.

Neurobiology of economic choice: a good-based model.经济选择的神经生物学：基于良好的模型。

Annu Rev Neurosci. 2011;34:333-59. doi: 10.1146/annurev-neuro-061010-113648.

Model-based influences on humans' choices and striatal prediction errors.基于模型的影响对人类选择和纹状体预测误差的影响。

Neuron. 2011 Mar 24;69(6):1204-15. doi: 10.1016/j.neuron.2011.02.027.

Preference reversal in multiattribute choice.多属性选择中的偏好反转。

Psychol Rev. 2010 Oct;117(4):1275-93. doi: 10.1037/a0020580.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验