Bayer Hannah M, Glimcher Paul W
Center for Neural Science, New York University, New York, NY 10003, USA.
Neuron. 2005 Jul 7;47(1):129-41. doi: 10.1016/j.neuron.2005.05.020.
The midbrain dopamine neurons are hypothesized to provide a physiological correlate of the reward prediction error signal required by current models of reinforcement learning. We examined the activity of single dopamine neurons during a task in which subjects learned by trial and error when to make an eye movement for a juice reward. We found that these neurons encoded the difference between the current reward and a weighted average of previous rewards, a reward prediction error, but only for outcomes that were better than expected. Thus, the firing rate of midbrain dopamine neurons is quantitatively predicted by theoretical descriptions of the reward prediction error signal used in reinforcement learning models for circumstances in which this signal has a positive value. We also found that the dopamine system continued to compute the reward prediction error even when the behavioral policy of the animal was only weakly influenced by this computation.
中脑多巴胺神经元被假定为提供当前强化学习模型所需的奖励预测误差信号的生理相关物。我们在一项任务中检测了单个多巴胺神经元的活动,在该任务中,受试者通过反复试验来学习何时进行眼动以获得果汁奖励。我们发现,这些神经元编码了当前奖励与先前奖励的加权平均值之间的差异,即奖励预测误差,但仅针对优于预期的结果。因此,中脑多巴胺神经元的放电频率可通过强化学习模型中用于该信号具有正值情况的奖励预测误差信号的理论描述进行定量预测。我们还发现,即使动物的行为策略仅受到该计算的微弱影响,多巴胺系统仍会继续计算奖励预测误差。