Physical and Health Education, Graduate School of Education, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan.
J Neurosci. 2013 May 15;33(20):8866-90. doi: 10.1523/JNEUROSCI.4614-12.2013.
Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.
当人类和动物预期行动会带来奖励时,它们会迅速采取行动,这反映了它们的动机。向纹状体中注射多巴胺受体拮抗剂已被证明可以减缓这种寻求奖励的行为,表明多巴胺参与了动机过程的控制。同时,神经生理学研究表明,多巴胺神经元的相位反应似乎代表了奖励预测误差,表明多巴胺在强化学习中起着核心作用。然而,以前试图阐明这些多巴胺能控制机制的尝试并没有完全解释动机和学习方面是如何相关的,以及它们是否可以通过多巴胺神经元自身的活动由其上游回路控制的方式来理解。为了解决这个问题,我们根据最近关于皮质内和皮质纹状体回路结构的发现,构建了一个皮质基底节系统的闭环模型。模拟表明,该模型可以重现观察到的 D1 型和 D2 型多巴胺受体拮抗剂的不同动机效应。同时,我们的模型成功地解释了在学习任务中观察到的多巴胺对奖励预测误差的表示,并且还可以解释光遗传学刺激表达 D1 和 D2 受体的纹状体神经元引起的不同选择偏好。这些结果表明,通过基底神经节间接通路代表先前时间点状态/动作的值的概念,可以以统一的方式理解多巴胺在动机控制和强化学习中的作用,这是我们模型的一个经验驱动的关键假设。