School of Psychology, University of East Anglia, United Kingdom.
Cognition Institute, School of Psychology, University of Plymouth, United Kingdom.
Neuroimage. 2018 Sep;178:162-171. doi: 10.1016/j.neuroimage.2018.05.023. Epub 2018 May 24.
Learning theorists posit two reinforcement learning systems: model-free and model-based. Model-based learning incorporates knowledge about structure and contingencies in the world to assign candidate actions with an expected value. Model-free learning is ignorant of the world's structure; instead, actions hold a value based on prior reinforcement, with this value updated by expectancy violation in the form of a reward prediction error. Because they use such different learning mechanisms, it has been previously assumed that model-based and model-free learning are computationally dissociated in the brain. However, recent fMRI evidence suggests that the brain may compute reward prediction errors to both model-free and model-based estimates of value, signalling the possibility that these systems interact. Because of its poor temporal resolution, fMRI risks confounding reward prediction errors with other feedback-related neural activity. In the present study, EEG was used to show the presence of both model-based and model-free reward prediction errors and their place in a temporal sequence of events including state prediction errors and action value updates. This demonstration of model-based prediction errors questions a long-held assumption that model-free and model-based learning are dissociated in the brain.
无模型和基于模型。基于模型的学习结合了对世界结构和规律的知识,为候选动作分配具有预期值的奖励。无模型学习则忽略了世界的结构;相反,动作的价值是基于先前的强化,这种价值通过奖励预测误差的形式更新。由于它们使用了不同的学习机制,之前人们假设基于模型和无模型的学习在大脑中是计算分离的。然而,最近的 fMRI 证据表明,大脑可能会对无模型和基于模型的价值估计计算奖励预测误差,这表明这些系统可能相互作用。由于 fMRI 的时间分辨率较差,它有可能将奖励预测误差与其他与反馈相关的神经活动混淆。在本研究中,我们使用 EEG 来展示基于模型和无模型奖励预测误差的存在,以及它们在包括状态预测误差和动作价值更新的时间序列事件中的位置。这一基于模型的预测误差的证明对长期以来的假设提出了质疑,即无模型和基于模型的学习在大脑中是分离的。