Department of Psychology and Neuroscience, University of Colorado Boulder, 345 UCB, Boulder, CO 80309, United States.
Neurosci Biobehav Rev. 2010 Apr;34(5):701-20. doi: 10.1016/j.neubiorev.2009.11.019. Epub 2009 Nov 26.
What biological mechanisms underlie the reward-predictive firing properties of midbrain dopaminergic neurons, and how do they relate to the complex constellation of empirical findings understood as Pavlovian and instrumental conditioning? We previously presented PVLV, a biologically inspired Pavlovian learning algorithm accounting for DA activity in terms of two interrelated systems: a primary value (PV) system, which governs how DA cells respond to a US (reward) and; a learned value (LV) system, which governs how DA cells respond to a CS. Here, we provide a more extensive review of the biological mechanisms supporting phasic DA firing and their relation to the spate of Pavlovian conditioning phenomena and their sensitivity to focal brain lesions. We further extend the model by incorporating a new NV (novelty value) component reflecting the ability of novel stimuli to trigger phasic DA firing, providing "novelty bonuses" which encourages exploratory working memory updating and in turn speeds learning in trace conditioning and other working memory-dependent paradigms. The evolving PVLV model builds upon insights developed in many earlier computational models, especially reinforcement learning models based on the ideas of Sutton and Barto, biological models, and the psychological model developed by Savastano and Miller. The PVLV framework synthesizes these various approaches, overcoming important shortcomings of each by providing a coherent and specific mapping to much of the relevant empirical data at both the micro- and macro-levels, and examines their relevance for higher order cognitive functions.
中脑多巴胺能神经元的奖赏预测发射特性的生物学机制是什么,它们与作为经典条件作用和工具条件作用的复杂经验发现有何关系?我们之前提出了 PVLV,这是一种受生物启发的经典条件作用学习算法,根据两个相互关联的系统来解释 DA 活动:一个是主要价值(PV)系统,它决定了 DA 细胞对 US(奖励)的反应方式;另一个是习得价值(LV)系统,它决定了 DA 细胞对 CS 的反应方式。在这里,我们更全面地回顾了支持相位 DA 发射的生物学机制及其与大量经典条件作用现象的关系,以及它们对焦点脑损伤的敏感性。我们通过引入一个新的 NV(新颖价值)组件进一步扩展了该模型,该组件反映了新刺激触发相位 DA 发射的能力,提供了“新颖性奖励”,鼓励探索性工作记忆更新,从而加快痕迹条件作用和其他依赖工作记忆的范式中的学习。不断发展的 PVLV 模型建立在许多早期计算模型的见解之上,特别是基于 Sutton 和 Barto 的强化学习模型、生物模型以及 Savastano 和 Miller 开发的心理模型。PVLV 框架综合了这些不同的方法,通过为微观和宏观层面的大量相关经验数据提供一致和具体的映射,克服了每种方法的重要缺点,并考察了它们对更高阶认知功能的相关性。