Wellcome Integrative Neuroimaging (WIN), Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom.
Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands.
PLoS Biol. 2020 Oct 30;18(10):e3000899. doi: 10.1371/journal.pbio.3000899. eCollection 2020 Oct.
Animals learn from the past to make predictions. These predictions are adjusted after prediction errors, i.e., after surprising events. Generally, most reward prediction errors models learn the average expected amount of reward. However, here we demonstrate the existence of distinct mechanisms for detecting other types of surprising events. Six macaques learned to respond to visual stimuli to receive varying amounts of juice rewards. Most trials ended with the delivery of either 1 or 3 juice drops so that animals learned to expect 2 juice drops on average even though instances of precisely 2 drops were rare. To encourage learning, we also included sessions during which the ratio between 1 and 3 drops changed. Additionally, in all sessions, the stimulus sometimes appeared in an unexpected location. Thus, 3 types of surprising events could occur: reward amount surprise (i.e., a scalar reward prediction error), rare reward surprise, and visuospatial surprise. Importantly, we can dissociate scalar reward prediction errors-rewards that deviated from the average reward amount expected-and rare reward events-rewards that accorded with the average reward expectation but that rarely occurred. We linked each type of surprise to a distinct pattern of neural activity using functional magnetic resonance imaging. Activity in the vicinity of the dopaminergic midbrain only reflected surprise about the amount of reward. Lateral prefrontal cortex had a more general role in detecting surprising events. Posterior lateral orbitofrontal cortex specifically detected rare reward events regardless of whether they followed average reward amount expectations, but only in learnable reward environments.
动物通过从过去中学习来进行预测。这些预测会在预测错误后进行调整,即发生意外事件后。通常,大多数奖励预测误差模型学习的是平均预期奖励数量。然而,在这里我们证明了存在用于检测其他类型意外事件的不同机制。六只猕猴学会了对视觉刺激做出反应,以获得不同数量的果汁奖励。大多数试验以 1 或 3 滴果汁的形式结束,因此动物平均预期会得到 2 滴果汁,尽管恰好有 2 滴果汁的情况很少见。为了鼓励学习,我们还包括了 1 比 3 滴比例变化的阶段。此外,在所有阶段,刺激有时会出现在意想不到的位置。因此,有 3 种类型的意外事件可能发生:奖励数量的意外(即标量奖励预测误差)、罕见奖励的意外和视觉空间的意外。重要的是,我们可以将标量奖励预测误差(即偏离平均奖励金额预期的奖励)和罕见奖励事件(与平均奖励预期相符但很少发生的奖励)区分开来。我们使用功能磁共振成像将每种类型的惊喜与特定的神经活动模式联系起来。多巴胺能中脑附近的活动仅反映了奖励数量的惊喜。外侧前额叶皮层在检测意外事件方面具有更普遍的作用。外侧眶额后皮质(posterior lateral orbitofrontal cortex)特异性地检测罕见的奖励事件,无论它们是否符合平均奖励金额的预期,但仅在可学习的奖励环境中。