d'Acremont Mathieu, Lu Zhong-Lin, Li Xiangrui, Van der Linden Martial, Bechara Antoine
National Centre of Competence in Research (NCCR) in Affective Sciences, University of Geneva, Geneva, Switzerland.
Neuroimage. 2009 Oct 1;47(4):1929-39. doi: 10.1016/j.neuroimage.2009.04.096. Epub 2009 May 13.
Behavioral studies have shown for decades that humans are sensitive to risk when making decisions. More recently, brain activities have been shown to be correlated with risky choices. But an important gap needs to be filled: How does the human brain learn which decisions are risky? In cognitive neuroscience, reinforcement learning has never been used to estimate reward variance, a common measure of risk in economics and psychology. It is thus unknown which brain regions are involved in risk learning. To address this question, participants completed a decision-making task during fMRI. They chose repetitively from four decks of cards and each selection was followed by a stochastic payoff. Expected reward and risk differed among the decks. Participants' aim was to maximize payoffs. Risk and reward prediction errors were calculated after each payoff based on a novel reinforcement learning model. For reward prediction error, the strongest correlation was found with the BOLD response in the striatum. For risk prediction error, the strongest correlation was found with the BOLD responses in the insula and inferior frontal gyrus. We conclude that risk and reward prediction errors are processed by distinct neural circuits during reinforcement learning. Additional analyses revealed that the BOLD response in the inferior frontal gyrus was more pronounced for risk aversive participants, suggesting that this region also serves to inhibit risky choices.
几十年来,行为研究表明,人类在做决策时对风险很敏感。最近,大脑活动已被证明与风险选择相关。但一个重要的空白有待填补:人类大脑是如何学会识别哪些决策是有风险的?在认知神经科学中,强化学习从未被用于估计奖励方差,而奖励方差是经济学和心理学中衡量风险的常用指标。因此,尚不清楚哪些脑区参与了风险学习。为了解决这个问题,参与者在功能磁共振成像(fMRI)期间完成了一项决策任务。他们从四组牌中反复进行选择,每次选择后都会有一个随机的收益。不同组牌的预期奖励和风险各不相同。参与者的目标是使收益最大化。基于一种新颖的强化学习模型,在每次收益后计算风险和奖励预测误差。对于奖励预测误差,发现与纹状体中的血氧水平依赖(BOLD)反应相关性最强。对于风险预测误差,发现与脑岛和额下回中的BOLD反应相关性最强。我们得出结论,在强化学习过程中,风险和奖励预测误差由不同的神经回路处理。进一步的分析表明,对于风险厌恶型参与者,额下回中的BOLD反应更为明显,这表明该区域也有助于抑制风险选择。