Champalimaud Neuroscience Programme, Champalimaud Foundation, Lisbon, Portugal.
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA.
Nat Neurosci. 2022 Jun;25(6):738-748. doi: 10.1038/s41593-022-01085-7. Epub 2022 Jun 6.
Reward expectations based on internal knowledge of the external environment are a core component of adaptive behavior. However, internal knowledge may be inaccurate or incomplete due to errors in sensory measurements. Some features of the environment may also be encoded inaccurately to minimize representational costs associated with their processing. In this study, we investigated how reward expectations are affected by features of internal representations by studying behavior and dopaminergic activity while mice make time-based decisions. We show that several possible representations allow a reinforcement learning agent to model animals' overall performance during the task. However, only a small subset of highly compressed representations simultaneously reproduced the co-variability in animals' choice behavior and dopaminergic activity. Strikingly, these representations predict an unusual distribution of response times that closely match animals' behavior. These results inform how constraints of representational efficiency may be expressed in encoding representations of dynamic cognitive variables used for reward-based computations.
基于对外界环境内部知识的奖励预期是适应性行为的核心组成部分。然而,由于感官测量中的错误,内部知识可能会不准确或不完整。环境的某些特征也可能被不准确地编码,以最小化与处理相关的表示成本。在这项研究中,我们通过研究老鼠在基于时间的决策过程中的行为和多巴胺活动,研究了内部表示的特征如何影响奖励预期。我们表明,几种可能的表示形式允许强化学习代理在任务期间模拟动物的整体表现。然而,只有一小部分高度压缩的表示形式同时再现了动物选择行为和多巴胺活动的共变。引人注目的是,这些表示形式预测了一种不寻常的反应时间分布,与动物的行为非常吻合。这些结果表明,用于基于奖励的计算的动态认知变量的表示效率的约束条件如何在编码表示中得到表达。