Sumiya Motofumi, Katahira Kentaro
Department of Cognitive and Psychological Sciences, Graduate School of Informatics, Nagoya University, Nagoya, Japan.
Japan Society for the Promotion of Science, Tokyo, Japan.
Front Neurosci. 2020 Sep 8;14:852. doi: 10.3389/fnins.2020.00852. eCollection 2020.
Surprise occurs because of differences between a decision outcome and its predicted outcome (prediction error), regardless of whether the error is positive or negative. It has recently been postulated that surprise affects the reward value of the action outcome; studies have indicated that increasing surprise as an absolute value of prediction error decreases the value of the outcome. However, how surprise affects the value of the outcome and subsequent decision making is unclear. We suggest that, on the assumption that surprise decreases the outcome value, agents will increase their risk-averse choices when an outcome is often surprising. Here, we propose the surprise-sensitive utility model, a reinforcement learning model that states that surprise decreases the outcome value, to explain how surprise affects subsequent decision making. To investigate the properties of the proposed model, we compare the model with previous reinforcement learning models on two probabilistic learning tasks by simulations. As a result, the proposed model explains the risk-averse choices like the previous models, and the risk-averse choices increase as the surprise-based modulation parameter of outcome value increases. We also performed statistical model selection by using two experimental datasets with different tasks. The proposed model fits these datasets better than the other models with the same number of free parameters, indicating that the model can better capture the trial-by-trial dynamics of choice behavior.
意外的发生是由于决策结果与其预测结果之间存在差异(预测误差),无论该误差是正还是负。最近有研究假设,意外会影响行动结果的奖励价值;研究表明,将意外作为预测误差的绝对值增加时,结果的价值会降低。然而,意外如何影响结果的价值以及后续的决策尚不清楚。我们认为,假设意外会降低结果价值,那么当一个结果经常令人意外时,主体会增加其风险规避选择。在此,我们提出了意外敏感效用模型,这是一种强化学习模型,该模型指出意外会降低结果价值,以解释意外如何影响后续决策。为了研究该模型的特性,我们通过模拟在两个概率学习任务上,将该模型与之前的强化学习模型进行比较。结果表明,该模型与之前的模型一样能够解释风险规避选择,并且随着基于意外的结果价值调制参数增加,风险规避选择也会增加。我们还使用两个具有不同任务的实验数据集进行了统计模型选择。在具有相同数量自由参数的情况下,该模型比其他模型能更好地拟合这些数据集,这表明该模型能够更好地捕捉选择行为的逐次试验动态。