Contreras-Vidal J L, Schultz W
Motor Control Laboratory, Arizona State University, Tempe 85287-0404, USA.
J Comput Neurosci. 1999 May-Jun;6(3):191-214. doi: 10.1023/a:1008862904946.
A neural network model of how dopamine and prefrontal cortex activity guides short- and long-term information processing within the cortico-striatal circuits during reward-related learning of approach behavior is proposed. The model predicts two types of reward-related neuronal responses generated during learning: (1) cell activity signaling errors in the prediction of the expected time of reward delivery and (2) neural activations coding for errors in the prediction of the amount and type of reward or stimulus expectancies. The former type of signal is consistent with the responses of dopaminergic neurons, while the latter signal is consistent with reward expectancy responses reported in the prefrontal cortex. It is shown that a neural network architecture that satisfies the design principles of the adaptive resonance theory of Carpenter and Grossberg (1987) can account for the dopamine responses to novelty, generalization, and discrimination of appetitive and aversive stimuli. These hypotheses are scrutinized via simulations of the model in relation to the delivery of free food outside a task, the timed contingent delivery of appetitive and aversive stimuli, and an asymmetric, instructed delay response task.
提出了一种神经网络模型,该模型阐述了在与接近行为相关的奖励学习过程中,多巴胺和前额叶皮层活动如何在皮质-纹状体回路中引导短期和长期信息处理。该模型预测了学习过程中产生的两种与奖励相关的神经元反应:(1)细胞活动,其在预期奖励交付时间的预测中发出错误信号;(2)神经激活,其编码奖励数量、奖励类型或刺激预期预测中的错误。前一种信号类型与多巴胺能神经元的反应一致,而后一种信号与前额叶皮层中报告的奖励预期反应一致。结果表明,一种满足Carpenter和Grossberg(1987)自适应共振理论设计原则的神经网络架构,可以解释多巴胺对新奇事物、泛化以及对食欲和厌恶刺激的辨别所产生的反应。通过对该模型的模拟,针对任务外免费食物的发放、食欲和厌恶刺激的定时条件发放以及不对称的、有指导的延迟反应任务,对这些假设进行了仔细研究。