Psychology Department and Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08540, USA.
J Neurosci. 2012 Jan 11;32(2):551-62. doi: 10.1523/JNEUROSCI.5498-10.2012.
Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.
人类和动物对自身行为结果的风险或变化非常敏感,尽管这种敏感性因人而异。当明确提供有关风险的信息时,人们已经很好地研究了经济、心理和神经方面的问题。然而,我们通常必须通过试错从经验中了解结果。这种强化学习的传统模型主要关注学习线索的平均奖励值,而忽略方差等更高阶矩。我们使用 fMRI 来测试人类强化学习的神经相关性是否对经验风险敏感。我们的分析主要集中在前扣带皮层的预先确定的感兴趣的解剖区域,在该区域中,血氧水平依赖(BOLD)信号已被建议与从强化学习中得出的数量相关。我们首先提供了无偏的证据,表明这些区域中的原始 BOLD 信号与奖励预测误差密切相关。然后,我们从该信号中推导出预测具有相同均值但不同方差的奖励的线索的学习值,结果表明这些值确实受到经验风险的调节。此外,我们在神经上测量的风险选择的基于经验的评估的波动与行为风险厌恶的波动之间存在紧密的神经计量-心理计量耦合。这表明风险敏感性是人类学习的一个组成部分,阐明了选择的经济模型、情感学习的神经科学模型以及潜在神经机制的工作原理。