Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et Recherche Médicale, Paris, France.
Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, Paris, France.
Elife. 2023 Jul 10;12:e83891. doi: 10.7554/eLife.83891.
Reinforcement learning research in humans and other species indicates that rewards are represented in a context-dependent manner. More specifically, reward representations seem to be normalized as a function of the value of the alternative options. The dominant view postulates that value context-dependence is achieved via a divisive normalization rule, inspired by perceptual decision-making research. However, behavioral and neural evidence points to another plausible mechanism: range normalization. Critically, previous experimental designs were ill-suited to disentangle the divisive and the range normalization accounts, which generate similar behavioral predictions in many circumstances. To address this question, we designed a new learning task where we manipulated, across learning contexts, the number of options and the value ranges. Behavioral and computational analyses falsify the divisive normalization account and rather provide support for the range normalization rule. Together, these results shed new light on the computational mechanisms underlying context-dependence in learning and decision-making.
人类和其他物种的强化学习研究表明,奖励是以依赖于上下文的方式表示的。更具体地说,奖励表示似乎被归一化为替代选项价值的函数。主流观点假设,价值的上下文依赖性是通过基于感知决策研究的除法归一化规则实现的。然而,行为和神经证据指向另一种可能的机制:范围归一化。关键是,以前的实验设计不适合区分除法和范围归一化解释,在许多情况下,这两种解释都会产生相似的行为预测。为了解决这个问题,我们设计了一个新的学习任务,在这个任务中,我们在学习环境中跨多个选项和价值范围进行操作。行为和计算分析否定了除法归一化解释,而是为范围归一化规则提供了支持。这些结果共同揭示了学习和决策过程中上下文依赖性的计算机制。