Nakahara Hiroyuki, Itoh Hideaki, Kawagoe Reiko, Takikawa Yoriko, Hikosaka Okihide
Lab for Mathematical Neuroscience, RIKEN Brain Science Institute, Wako, Saitama, Japan.
Neuron. 2004 Jan 22;41(2):269-80. doi: 10.1016/s0896-6273(03)00869-9.
Midbrain dopamine (DA) neurons are thought to encode reward prediction error. Reward prediction can be improved if any relevant context is taken into account. We found that monkey DA neurons can encode a context-dependent prediction error. In the first noncontextual task, a light stimulus was randomly followed by reward, with a fixed equal probability. The response of DA neurons was positively correlated with the number of preceding unrewarded trials and could be simulated by a conventional temporal difference (TD) model. In the second contextual task, a reward-indicating light stimulus was presented with the probability that, while fixed overall, was incremented as a function of the number of preceding unrewarded trials. The DA neuronal response then was negatively correlated with this number. This history effect corresponded to the prediction error based on the conditional probability of reward and could be simulated only by implementing the relevant context into the TD model.
中脑多巴胺(DA)神经元被认为编码奖励预测误差。如果考虑任何相关背景,奖励预测可以得到改善。我们发现猴子的DA神经元可以编码依赖于背景的预测误差。在第一个非背景任务中,光刺激之后随机给予奖励,概率固定且相等。DA神经元的反应与之前未获得奖励的试验次数呈正相关,并且可以用传统的时间差分(TD)模型进行模拟。在第二个背景任务中,呈现一个指示奖励的光刺激,其概率虽然总体上是固定的,但会根据之前未获得奖励的试验次数而增加。然后DA神经元的反应与这个次数呈负相关。这种历史效应对应于基于奖励条件概率的预测误差,并且只有在将相关背景纳入TD模型时才能进行模拟。