Williams Chad C, Hassall Cameron D, Trska Robert, Holroyd Clay B, Krigolson Olave E
Centre for Biomedical Research, University of Victoria, Victoria, British Columbia, V8W 2Y2, Canada.
Centre for Biomedical Research, University of Victoria, Victoria, British Columbia, V8W 2Y2, Canada.
Biol Psychol. 2017 Oct;129:265-272. doi: 10.1016/j.biopsycho.2017.09.007. Epub 2017 Sep 18.
Comparisons between expectations and outcomes are critical for learning. Termed prediction errors, the violations of expectancy that occur when outcomes differ from expectations are used to modify value and shape behaviour. In the present study, we examined how a wide range of expectancy violations impacted neural signals associated with feedback processing. Participants performed a time estimation task in which they had to guess the duration of one second while their electroencephalogram was recorded. In a key manipulation, we varied task difficulty across the experiment to create a range of different feedback expectancies - reward feedback was either very expected, expected, 50/50, unexpected, or very unexpected. As predicted, the amplitude of the reward positivity, a component of the human event-related brain potential associated with feedback processing, scaled inversely with expectancy (e.g., unexpected feedback yielded a larger reward positivity than expected feedback). Interestingly, the scaling of the reward positivity to outcome expectancy was not linear as would be predicted by some theoretical models. Specifically, we found that the amplitude of the reward positivity was about equivalent for very expected and expected feedback, and for very unexpected and unexpected feedback. As such, our results demonstrate a sigmoidal relationship between reward expectancy and the amplitude of the reward positivity, with interesting implications for theories of reinforcement learning.
期望与结果之间的比较对学习至关重要。当结果与期望不同时出现的期望违背被称为预测误差,它被用于修正价值并塑造行为。在本研究中,我们考察了广泛的期望违背如何影响与反馈处理相关的神经信号。参与者执行了一项时间估计任务,在记录脑电图的同时,他们必须猜测一秒钟的时长。在一个关键操作中,我们在整个实验过程中改变任务难度,以创造一系列不同的反馈期望——奖励反馈要么是非常可预期的、可预期的、五五开、不可预期的,要么是非常不可预期的。正如预测的那样,奖励正波的幅度,即与反馈处理相关的人类事件相关脑电位的一个成分,与期望成反比(例如,不可预期的反馈比可预期的反馈产生更大的奖励正波)。有趣的是,奖励正波对结果期望的缩放并非如一些理论模型所预测的那样呈线性。具体而言,我们发现非常可预期和可预期的反馈,以及非常不可预期和不可预期的反馈,其奖励正波的幅度大致相当。因此,我们的结果证明了奖励期望与奖励正波幅度之间呈S形关系,这对强化学习理论具有有趣的启示。