多巴胺神经元中奖励预测反应的时间差分模型。

TD models of reward predictive responses in dopamine neurons.

作者信息

Suri Roland E

机构信息

Computational Neurobiology Laboratory, The Salk Institute, San Diego, CA 92186, USA.

出版信息

Neural Netw. 2002 Jun-Jul;15(4-6):523-33. doi: 10.1016/s0893-6080(02)00046-1.

DOI:10.1016/s0893-6080(02)00046-1

PMID:12371509

Abstract

This article focuses on recent modeling studies of dopamine neuron activity and their influence on behavior. Activity of midbrain dopamine neurons is phasically increased by stimuli that increase the animal's reward expectation and is decreased below baseline levels when the reward fails to occur. These characteristics resemble the reward prediction error signal of the temporal difference (TD) model, which is a model of reinforcement learning. Computational modeling studies show that such a dopamine-like reward prediction error can serve as a powerful teaching signal for learning with delayed reinforcement, in particular for learning of motor sequences. Several lines of evidence suggest that dopamine is also involved in 'cognitive' processes that are not addressed by standard TD models. I propose the hypothesis that dopamine neuron activity is crucial for planning processes, also referred to as 'goal-directed behavior', which select actions by evaluating predictions about their motivational outcomes.

摘要

本文聚焦于近期关于多巴胺能神经元活动及其对行为影响的建模研究。中脑多巴胺能神经元的活动会因增加动物奖励期望的刺激而阶段性增强，而当奖励未出现时，其活动会降至基线水平以下。这些特征类似于时间差分（TD）模型中的奖励预测误差信号，TD模型是一种强化学习模型。计算建模研究表明，这种类似多巴胺的奖励预测误差可作为延迟强化学习的有力教学信号，特别是对于运动序列的学习。多条证据表明，多巴胺还参与了标准TD模型未涉及的“认知”过程。我提出一个假说，即多巴胺能神经元活动对于计划过程（也称为“目标导向行为”）至关重要，计划过程通过评估关于其动机结果的预测来选择行动。