FIDMAG Germanes Hospitalàries Research Foundation, Carrer Antoni Pujades 38, 08830 Sant Boi de Llobregat, Barcelona, Spain.
Universitat de Barcelona, Barcelona, Spain.
Brain Struct Funct. 2021 Jun;226(5):1553-1569. doi: 10.1007/s00429-021-02270-3. Epub 2021 Apr 11.
Reward prediction error, the difference between the expected and obtained reward, is known to act as a reinforcement learning neural signal. In the current study, we propose a model fitting approach that combines behavioral and neural data to fit computational models of reinforcement learning. Briefly, we penalized subject-specific fitted parameters that moved away too far from the group median, except when that deviation led to an improvement in the model's fit to neural responses. By means of a probabilistic monetary learning task and fMRI, we compared our approach with standard model fitting methods. Q-learning outperformed actor-critic at both behavioral and neural level, although the inclusion of neuroimaging data into model fitting improved the fit of actor-critic models. We observed both action-value and state-value prediction error signals in the striatum, while standard model fitting approaches failed to capture state-value signals. Finally, left ventral striatum correlated with reward prediction error while right ventral striatum with fictive prediction error, suggesting a functional hemispheric asymmetry regarding prediction-error driven learning.
奖励预测误差,即预期奖励与实际获得奖励之间的差异,被认为是强化学习的神经信号。在本研究中,我们提出了一种模型拟合方法,将行为和神经数据相结合,以拟合强化学习的计算模型。简而言之,我们对偏离群体中位数太远的个体特定拟合参数进行惩罚,但当这种偏差导致模型对神经反应的拟合得到改善时除外。通过概率货币学习任务和 fMRI,我们将我们的方法与标准模型拟合方法进行了比较。在行为和神经水平上,Q-learning 都优于 actor-critic,尽管将神经影像学数据纳入模型拟合可以提高 actor-critic 模型的拟合度。我们在纹状体中观察到了动作值和状态值预测误差信号,而标准的模型拟合方法无法捕捉到状态值信号。最后,左腹侧纹状体与奖励预测误差相关,而右腹侧纹状体与虚拟预测误差相关,这表明在基于预测误差的学习方面存在功能上的半球不对称性。