Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG London, United Kingdom;
Department for Imaging Neurosciences, Max Planck University College London Centre for Computational Psychiatry and Ageing Research, WC1B 5EH London, United Kingdom.
Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15871-15876. doi: 10.1073/pnas.1821647116. Epub 2019 Jul 18.
Model-free learning enables an agent to make better decisions based on prior experience while representing only minimal knowledge about an environment's structure. It is generally assumed that model-free state representations are based on outcome-relevant features of the environment. Here, we challenge this assumption by providing evidence that a putative model-free system assigns credit to task representations that are irrelevant to an outcome. We examined data from 769 individuals performing a well-described 2-step reward decision task where stimulus identity but not spatial-motor aspects of the task predicted reward. We show that participants assigned value to spatial-motor representations despite it being outcome irrelevant. Strikingly, spatial-motor value associations affected behavior across all outcome-relevant features and stages of the task, consistent with credit assignment to low-level state-independent task representations. Individual difference analyses suggested that the impact of spatial-motor value formation was attenuated for individuals who showed greater deployment of goal-directed (model-based) strategies. Our findings highlight a need for a reconsideration of how model-free representations are formed and regulated according to the structure of the environment.
无模型学习使智能体能够基于先前的经验做出更好的决策,同时对环境结构的了解最少。通常假设无模型状态表示基于环境的与结果相关的特征。在这里,我们通过提供证据挑战了这一假设,即假定的无模型系统将信用分配给与结果无关的任务表示。我们检查了 769 名个体执行描述良好的两步奖励决策任务的数据,其中刺激身份而不是任务的空间-运动方面预测奖励。我们表明,尽管结果无关,但参与者仍会为空间-运动表示分配价值。引人注目的是,尽管空间-运动价值关联受到与任务相关的所有特征和阶段的影响,但与低水平的、与状态无关的任务表示的信用分配一致。个体差异分析表明,对于表现出更大目标导向(基于模型)策略的个体,空间-运动价值形成的影响会减弱。我们的研究结果强调了需要重新考虑根据环境结构形成和调节无模型表示的方式。