Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland.
Geneva Finance Research Institute and Center for Affective Sciences, University of Geneva, Switzerland.
Neuroimage. 2020 Jul 1;214:116766. doi: 10.1016/j.neuroimage.2020.116766. Epub 2020 Apr 2.
Organisms use rewards to navigate and adapt to (uncertain) environments. Error-based learning about rewards is supported by the dopaminergic system, which is thought to signal reward prediction errors to make adjustments to past predictions. More recently, the phasic dopamine response was suggested to have two components: the first rapid component is thought to signal the detection of a potentially rewarding stimulus; the second, slightly later component characterizes the stimulus by its reward prediction error. Error-based learning signals have also been found for risk. However, whether the neural generators of these signals employ a two-component coding scheme like the dopaminergic system is unknown. Here, using human high density EEG, we ask whether risk learning, or more generally speaking surprise-based learning under uncertainty, is similarly comprised of two temporally dissociable components. Using a simple card game, we show that the risk prediction error is reflected in the amplitude of the P3b component. This P3b modulation is preceded by an earlier component, that is modulated by the stimulus salience. Source analyses are compatible with the idea that both the early salience signal and the later risk prediction error signal are generated in insular, frontal, and temporal cortex. The identified sources are parts of the risk processing network that receives input from noradrenergic cells in the locus coeruleus. Finally, the P3b amplitude modulation is mirrored by an analogous modulation of pupil size, which is consistent with the idea that both the P3b and pupil size indirectly reflect locus coeruleus activity.
生物利用奖励来导航和适应(不确定)的环境。基于错误的奖励学习得到多巴胺能系统的支持,该系统被认为会对奖励预测错误进行信号传递,从而对过去的预测进行调整。最近,人们提出了阶段性多巴胺反应有两个组成部分:第一个快速组成部分被认为是对潜在奖励刺激的检测;第二个稍晚的成分通过其奖励预测错误来描述刺激。风险的学习信号也已经被发现。然而,这些信号的神经发生器是否采用类似于多巴胺能系统的双成分编码方案尚不清楚。在这里,我们使用人类高密度 EEG,询问风险学习(或更一般地说,在不确定条件下基于惊讶的学习)是否同样由两个时间上可分离的成分组成。使用简单的纸牌游戏,我们表明风险预测误差反映在 P3b 成分的振幅中。这种 P3b 调制之前是由刺激显着性调制的早期成分。源分析与以下观点一致,即早期显着性信号和后期风险预测误差信号都在脑岛、额颞皮质中产生。所确定的来源是接收蓝斑去甲肾上腺素能细胞输入的风险处理网络的一部分。最后,P3b 幅度调制与瞳孔大小的类似调制相匹配,这与 P3b 和瞳孔大小间接反映蓝斑活动的观点一致。