Suppr超能文献

基于模型的学习会对无模型值进行回顾性更新。

Model-based learning retrospectively updates model-free values.

机构信息

Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK.

出版信息

Sci Rep. 2022 Feb 11;12(1):2358. doi: 10.1038/s41598-022-05567-3.

Abstract

Reinforcement learning (RL) is widely regarded as divisible into two distinct computational strategies. Model-free learning is a simple RL process in which a value is associated with actions, whereas model-based learning relies on the formation of internal models of the environment to maximise reward. Recently, theoretical and animal work has suggested that such models might be used to train model-free behaviour, reducing the burden of costly forward planning. Here we devised a way to probe this possibility in human behaviour. We adapted a two-stage decision task and found evidence that model-based processes at the time of learning can alter model-free valuation in healthy individuals. We asked people to rate subjective value of an irrelevant feature that was seen at the time a model-based decision would have been made. These irrelevant feature value ratings were updated by rewards, but in a way that accounted for whether the selected action retrospectively ought to have been taken. This model-based influence on model-free value ratings was best accounted for by a reward prediction error that was calculated relative to the decision path that would most likely have led to the reward. This effect occurred independently of attention and was not present when participants were not explicitly told about the structure of the environment. These findings suggest that current conceptions of model-based and model-free learning require updating in favour of a more integrated approach. Our task provides an empirical handle for further study of the dialogue between these two learning systems in the future.

摘要

强化学习(RL)被广泛认为可以分为两种不同的计算策略。无模型学习是一种简单的 RL 过程,其中一个值与动作相关联,而基于模型的学习则依赖于形成环境的内部模型来最大化奖励。最近,理论和动物研究表明,这种模型可能用于训练无模型行为,从而减轻昂贵的前瞻性规划的负担。在这里,我们设计了一种在人类行为中探究这种可能性的方法。我们改编了一个两阶段决策任务,并发现证据表明,在学习过程中基于模型的过程可以改变健康个体的无模型估值。我们要求人们对一个无关特征的主观价值进行评分,而这个无关特征是在做出基于模型的决策时看到的。这些无关特征的价值评分会根据奖励进行更新,但更新的方式要考虑到所选择的动作是否应该被采取。这种基于模型的对无模型价值评分的影响可以通过相对于最有可能导致奖励的决策路径计算的奖励预测误差来最好地解释。这种效应独立于注意力而出现,当参与者没有被告知环境的结构时,这种效应并不存在。这些发现表明,当前关于基于模型和无模型学习的概念需要更新,以支持更具综合性的方法。我们的任务为未来进一步研究这两种学习系统之间的对话提供了一个经验性的手段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08e9/8837618/194e9a6a88f1/41598_2022_5567_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验