Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA.
Psychology Department, Princeton University, Princeton, NJ, USA.
Nat Neurosci. 2023 May;26(5):830-839. doi: 10.1038/s41593-023-01310-x. Epub 2023 Apr 20.
Dopamine neuron activity is tied to the prediction error in temporal difference reinforcement learning models. These models make significant simplifying assumptions, particularly with regard to the structure of the predictions fed into the dopamine neurons, which consist of a single chain of timepoint states. Although this predictive structure can explain error signals observed in many studies, it cannot cope with settings where subjects might infer multiple independent events and outcomes. In the present study, we recorded dopamine neurons in the ventral tegmental area in such a setting to test the validity of the single-stream assumption. Rats were trained in an odor-based choice task, in which the timing and identity of one of several rewards delivered in each trial changed across trial blocks. This design revealed an error signaling pattern that requires the dopamine neurons to access and update multiple independent predictive streams reflecting the subject's belief about timing and potentially unique identities of expected rewards.
多巴胺神经元的活动与时间差分强化学习模型中的预测误差有关。这些模型做出了重大的简化假设,特别是对于输入到多巴胺神经元的预测结构,其仅由单个时间点状态链组成。尽管这种预测结构可以解释许多研究中观察到的错误信号,但它无法应对主体可能推断出多个独立事件和结果的情况。在本研究中,我们在这样的设置中记录腹侧被盖区的多巴胺神经元,以测试单流假设的有效性。大鼠在基于气味的选择任务中接受训练,在该任务中,每次试验中多个奖励之一的时间和身份在试验块之间变化。这种设计揭示了一种错误信号模式,要求多巴胺神经元访问和更新多个独立的预测流,以反映主体关于预期奖励的时间和潜在独特身份的信念。