Laboratoire de Neurosciences Cognitives Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005, Paris, France.
Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, 75005, France.
Nat Commun. 2018 Oct 29;9(1):4503. doi: 10.1038/s41467-018-06781-2.
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects' behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation-two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.
在经济学和感性决策中,语境效应是有充分记录的,其中决策权重会根据刺激的分布进行调整。然而,在强化学习文献中,与决策状态相关的语境信息是否以及如何被整合到学习算法中,受到的关注相对较少。在这里,我们在一个任务中研究强化学习行为及其计算基础,在这个任务中,我们正交地操纵结果的效价和大小,从而导致状态值的系统变化。模型比较表明,受试者的行为最好由一个包含参考点依赖性和范围适应的算法来解释,这是状态相关估值的两个关键特征。此外,我们发现状态相关的结果估值逐渐出现,受到增加的结果信息的青睐,并与对任务结构的明确理解相关。最后,我们的数据清楚地表明,虽然状态相关的估值具有局部适应性(例如在负效价和小幅度的情况下),但当选项从其原始环境中推断出来时,它会以看似不合理的选择为代价。