Wen Chentao, Ogura Yukiko, Matsushima Toshiya
Graduate School of Life Science, Hokkaido University Sapporo, Japan.
Department of Psychiatry, Graduate School of Medicine, Hokkaido UniversitySapporo, Japan; Japan Society for Promotion of SciencesTokyo, Japan.
Front Neurosci. 2016 Nov 8;10:476. doi: 10.3389/fnins.2016.00476. eCollection 2016.
To ensure survival, animals must update the internal representations of their environment in a trial-and-error fashion. Psychological studies of associative learning and neurophysiological analyses of dopaminergic neurons have suggested that this updating process involves the temporal-difference (TD) method in the basal ganglia network. However, the way in which the component variables of the TD method are implemented at the neuronal level is unclear. To investigate the underlying neural mechanisms, we trained domestic chicks to associate color cues with food rewards. We recorded neuronal activities from the medial striatum or tegmentum in a freely behaving condition and examined how reward omission changed neuronal firing. To compare neuronal activities with the signals assumed in the TD method, we simulated the behavioral task in the form of a finite sequence composed of discrete steps of time. The three signals assumed in the simulated task were the prediction signal, the target signal for updating, and the TD-error signal. In both the medial striatum and tegmentum, the majority of recorded neurons were categorized into three types according to their fitness for three models, though these neurons tended to form a continuum spectrum without distinct differences in the firing rate. Specifically, two types of striatal neurons successfully mimicked the target signal and the prediction signal. A linear summation of these two types of striatum neurons was a good fit for the activity of one type of tegmental neurons mimicking the TD-error signal. The present study thus demonstrates that the striatum and tegmentum can convey the signals critically required for the TD method. Based on the theoretical and neurophysiological studies, together with tract-tracing data, we propose a novel model to explain how the convergence of signals represented in the striatum could lead to the computation of TD error in tegmental dopaminergic neurons.
为确保生存,动物必须以试错的方式更新其对环境的内部表征。联想学习的心理学研究和多巴胺能神经元的神经生理学分析表明,这种更新过程涉及基底神经节网络中的时间差分(TD)方法。然而,TD方法的组成变量在神经元水平上的实现方式尚不清楚。为了研究潜在的神经机制,我们训练家鸡将颜色线索与食物奖励联系起来。我们在自由行为条件下记录了内侧纹状体或被盖区的神经元活动,并研究了奖励缺失如何改变神经元放电。为了将神经元活动与TD方法中假设的信号进行比较,我们以由离散时间步组成的有限序列形式模拟了行为任务。模拟任务中假设的三个信号是预测信号、用于更新的目标信号和TD误差信号。在内侧纹状体和被盖区,大多数记录的神经元根据它们对三种模型的拟合程度被分为三种类型,尽管这些神经元倾向于形成一个连续的光谱,在放电率上没有明显差异。具体来说,两种类型的纹状体神经元成功地模拟了目标信号和预测信号。这两种类型的纹状体神经元的线性总和很好地拟合了一种模拟TD误差信号的被盖区神经元的活动。因此,本研究表明纹状体和被盖区可以传递TD方法所需的关键信号。基于理论和神经生理学研究,结合束路追踪数据,我们提出了一个新模型来解释纹状体中表示的信号的汇聚如何导致被盖区多巴胺能神经元中TD误差的计算。