Nasser Helen M, Calu Donna J, Schoenbaum Geoffrey, Sharpe Melissa J
Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore MD, USA.
Department of Anatomy and Neurobiology, University of Maryland School of Medicine, BaltimoreMD, USA; Cellular Neurobiology Research Branch, National Institute on Drug Abuse Intramural Research Program, BaltimoreMD, USA; Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, BaltimoreMD, USA.
Front Psychol. 2017 Feb 22;8:244. doi: 10.3389/fpsyg.2017.00244. eCollection 2017.
Phasic activity of midbrain dopamine neurons is currently thought to encapsulate the prediction-error signal described in Sutton and Barto's (1981) model-free reinforcement learning algorithm. This phasic signal is thought to contain information about the quantitative value of reward, which transfers to the reward-predictive cue after learning. This is argued to endow the reward-predictive cue with the value inherent in the reward, motivating behavior toward cues signaling the presence of reward. Yet theoretical and empirical research has implicated prediction-error signaling in learning that extends far beyond a transfer of quantitative value to a reward-predictive cue. Here, we review the research which demonstrates the complexity of how dopaminergic prediction errors facilitate learning. After briefly discussing the literature demonstrating that phasic dopaminergic signals can act in the manner described by Sutton and Barto (1981), we consider how these signals may also influence attentional processing across multiple attentional systems in distinct brain circuits. Then, we discuss how prediction errors encode and promote the development of context-specific associations between cues and rewards. Finally, we consider recent evidence that shows dopaminergic activity contains information about causal relationships between cues and rewards that reflect information garnered from rich associative models of the world that can be adapted in the absence of direct experience. In discussing this research we hope to support the expansion of how dopaminergic prediction errors are thought to contribute to the learning process beyond the traditional concept of transferring quantitative value.
目前认为,中脑多巴胺能神经元的相位性活动体现了萨顿和巴托(1981年)提出的无模型强化学习算法中所描述的预测误差信号。这种相位性信号被认为包含有关奖励定量价值的信息,在学习后会传递给奖励预测线索。有人认为,这赋予了奖励预测线索奖励所固有的价值,从而激发了针对预示奖励存在的线索的行为。然而,理论和实证研究表明,预测误差信号在学习中的作用远不止于将定量价值传递给奖励预测线索。在这里,我们回顾了相关研究,这些研究展示了多巴胺能预测误差促进学习的复杂性。在简要讨论了表明相位性多巴胺能信号可以按照萨顿和巴托(1981年)所描述的方式起作用的文献之后,我们考虑这些信号如何也可能影响不同脑回路中多个注意力系统的注意力加工。然后,我们讨论预测误差如何编码并促进线索与奖励之间特定情境关联的发展。最后,我们考虑最近的证据,这些证据表明多巴胺能活动包含有关线索与奖励之间因果关系的信息,这些信息反映了从丰富的世界联想模型中获取的信息,并且在没有直接经验的情况下也可以进行调整。在讨论这项研究时,我们希望支持对多巴胺能预测误差如何促进学习过程的理解的扩展,使其超越传统的定量价值传递概念。