Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08540, USA.
Psychol Rev. 2012 Jan;119(1):120-54. doi: 10.1037/a0026435.
Recent work has given rise to the view that reward-based decision making is governed by two key controllers: a habit system, which stores stimulus-response associations shaped by past reward, and a goal-oriented system that selects actions based on their anticipated outcomes. The current literature provides a rich body of computational theory addressing habit formation, centering on temporal-difference learning mechanisms. Less progress has been made toward formalizing the processes involved in goal-directed decision making. We draw on recent work in cognitive neuroscience, animal conditioning, cognitive and developmental psychology, and machine learning to outline a new theory of goal-directed decision making. Our basic proposal is that the brain, within an identifiable network of cortical and subcortical structures, implements a probabilistic generative model of reward, and that goal-directed decision making is effected through Bayesian inversion of this model. We present a set of simulations implementing the account, which address benchmark behavioral and neuroscientific findings, and give rise to a set of testable predictions. We also discuss the relationship between the proposed framework and other models of decision making, including recent models of perceptual choice, to which our theory bears a direct connection.
最近的研究提出了一种观点,即基于奖励的决策是由两个关键控制器来管理的:一个是习惯系统,它存储由过去奖励塑造的刺激-反应关联;另一个是目标导向系统,它根据预期结果选择行动。当前的文献提供了丰富的计算理论来解决习惯形成问题,这些理论主要集中在时间差分学习机制上。在形式化目标导向决策所涉及的过程方面,进展较少。我们借鉴认知神经科学、动物条件反射、认知和发展心理学以及机器学习方面的最新研究成果,概述了一种新的目标导向决策理论。我们的基本观点是,大脑在可识别的皮质和皮质下结构网络中,实现了一个奖励的概率生成模型,而目标导向决策是通过对该模型进行贝叶斯反演来实现的。我们提出了一组模拟实现该理论的方案,这些方案解决了基准行为和神经科学发现,并提出了一系列可测试的预测。我们还讨论了所提出的框架与其他决策模型之间的关系,包括最近的感知选择模型,我们的理论与这些模型直接相关。