Lak Armin, Stauffer William R, Schultz Wolfram
Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom.
Elife. 2016 Oct 27;5:e18044. doi: 10.7554/eLife.18044.
Economic theories posit reward probability as one of the factors defining reward value. Individuals learn the value of cues that predict probabilistic rewards from experienced reward frequencies. Building on the notion that responses of dopamine neurons increase with reward probability and expected value, we asked how dopamine neurons in monkeys acquire this value signal that may represent an economic decision variable. We found in a Pavlovian learning task that reward probability-dependent value signals arose from experienced reward frequencies. We then assessed neuronal response acquisition during choices among probabilistic rewards. Here, dopamine responses became sensitive to the value of both chosen and unchosen options. Both experiments showed also the novelty responses of dopamine neurones that decreased as learning advanced. These results show that dopamine neurons acquire predictive value signals from the frequency of experienced rewards. This flexible and fast signal reflects a specific decision variable and could update neuronal decision mechanisms.
经济理论认为奖励概率是定义奖励价值的因素之一。个体从经历的奖励频率中学习预测概率性奖励的线索的价值。基于多巴胺神经元的反应随奖励概率和预期价值增加这一观点,我们探究了猴子的多巴胺神经元如何获得这个可能代表经济决策变量的价值信号。我们在一项经典条件反射学习任务中发现,依赖奖励概率的价值信号源自经历的奖励频率。然后我们评估了在概率性奖励之间进行选择时神经元反应的习得情况。在此,多巴胺反应对所选和未选选项的价值都变得敏感。这两个实验还显示了多巴胺神经元的新奇反应,随着学习的推进这种反应会减少。这些结果表明,多巴胺神经元从经历的奖励频率中获得预测性价值信号。这种灵活且快速的信号反映了一个特定的决策变量,并可能更新神经元决策机制。