Gurney Kevin N, Humphries Mark D, Redgrave Peter
Department of Psychology, Adaptive Behaviour Research Group, University of Sheffield, United Kingdom; INSIGNEO Institute for In Silico Medicine, University of Sheffield, United Kingdom.
Faculty of Life Sciences, University of Manchester, United Kingdom.
PLoS Biol. 2015 Jan 6;13(1):e1002034. doi: 10.1371/journal.pbio.1002034. eCollection 2015 Jan.
Operant learning requires that reinforcement signals interact with action representations at a suitable neural interface. Much evidence suggests that this occurs when phasic dopamine, acting as a reinforcement prediction error, gates plasticity at cortico-striatal synapses, and thereby changes the future likelihood of selecting the action(s) coded by striatal neurons. But this hypothesis faces serious challenges. First, cortico-striatal plasticity is inexplicably complex, depending on spike timing, dopamine level, and dopamine receptor type. Second, there is a credit assignment problem-action selection signals occur long before the consequent dopamine reinforcement signal. Third, the two types of striatal output neuron have apparently opposite effects on action selection. Whether these factors rule out the interface hypothesis and how they interact to produce reinforcement learning is unknown. We present a computational framework that addresses these challenges. We first predict the expected activity changes over an operant task for both types of action-coding striatal neuron, and show they co-operate to promote action selection in learning and compete to promote action suppression in extinction. Separately, we derive a complete model of dopamine and spike-timing dependent cortico-striatal plasticity from in vitro data. We then show this model produces the predicted activity changes necessary for learning and extinction in an operant task, a remarkable convergence of a bottom-up data-driven plasticity model with the top-down behavioural requirements of learning theory. Moreover, we show the complex dependencies of cortico-striatal plasticity are not only sufficient but necessary for learning and extinction. Validating the model, we show it can account for behavioural data describing extinction, renewal, and reacquisition, and replicate in vitro experimental data on cortico-striatal plasticity. By bridging the levels between the single synapse and behaviour, our model shows how striatum acts as the action-reinforcement interface.
操作性学习要求强化信号在合适的神经接口处与动作表征相互作用。大量证据表明,当作为强化预测误差的相位性多巴胺调节皮质-纹状体突触的可塑性,从而改变选择由纹状体神经元编码的动作的未来可能性时,这种情况就会发生。但这一假设面临着严峻挑战。首先,皮质-纹状体可塑性复杂得令人费解,它取决于尖峰时间、多巴胺水平和多巴胺受体类型。其次,存在一个信用分配问题——动作选择信号在随后的多巴胺强化信号出现之前很久就已出现。第三,两种类型的纹状体输出神经元对动作选择的影响显然相反。这些因素是否排除了接口假设,以及它们如何相互作用以产生强化学习尚不清楚。我们提出了一个解决这些挑战的计算框架。我们首先预测了在操作性任务中两种类型的动作编码纹状体神经元的预期活动变化,并表明它们在学习中协同促进动作选择,而在消退中相互竞争以促进动作抑制。另外,我们从体外数据推导出了一个完整的多巴胺和尖峰时间依赖性皮质-纹状体可塑性模型。然后我们表明,该模型在操作性任务中产生了学习和消退所需的预测活动变化,这是一个自下而上的数据驱动可塑性模型与学习理论的自上而下行为要求的显著融合。此外,我们表明皮质-纹状体可塑性的复杂依赖性不仅对学习和消退是充分的,而且是必要的。通过验证该模型,我们表明它可以解释描述消退、恢复和重新习得的行为数据,并复制关于皮质-纹状体可塑性的体外实验数据。通过在单个突触和行为之间架起桥梁,我们的模型展示了纹状体如何作为动作-强化接口发挥作用。