Brzosko Zuzanna, Schultz Wolfram, Paulsen Ole
Department of Physiology, Development and Neuroscience, Physiological Laboratory, University of Cambridge, Cambridge, United Kingdom.
Elife. 2015 Oct 30;4:e09685. doi: 10.7554/eLife.09685.
Most reinforcement learning models assume that the reward signal arrives after the activity that led to the reward, placing constraints on the possible underlying cellular mechanisms. Here we show that dopamine, a positive reinforcement signal, can retroactively convert hippocampal timing-dependent synaptic depression into potentiation. This effect requires functional NMDA receptors and is mediated in part through the activation of the cAMP/PKA cascade. Collectively, our results support the idea that reward-related signaling can act on a pre-established synaptic eligibility trace, thereby associating specific experiences with behaviorally distant, rewarding outcomes. This finding identifies a biologically plausible mechanism for solving the 'distal reward problem'.
大多数强化学习模型假定奖励信号在导致奖励的活动之后到达,这对潜在的细胞机制施加了限制。在此我们表明,多巴胺作为一种正性强化信号,可以追溯性地将海马体中依赖时间的突触抑制转化为突触增强。这种效应需要功能性NMDA受体,并且部分是通过cAMP/PKA级联反应的激活来介导的。总体而言,我们的结果支持这样一种观点,即与奖励相关的信号传导可以作用于预先建立的突触可塑性痕迹,从而将特定经历与行为上遥远的奖励结果联系起来。这一发现确定了一种解决“远期奖励问题”的生物学上合理的机制。