Guggenmos Matthias, Wilbertz Gregor, Hebart Martin N, Sterzer Philipp
Bernstein Center for Computational Neuroscience, Berlin, Germany.
Visual Perception Laboratory, Charité Universitätsmedizin, Berlin, Germany.
Elife. 2016 Mar 29;5:e13388. doi: 10.7554/eLife.13388.
It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence-in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback.
众所周知,学习可以在没有外部反馈的情况下发生,但规范性强化学习理论难以解释此类学习实例。在此,我们提出人类观察者能够通过监测内部决策变量来生成自己的反馈信号。我们在一项视觉感知学习任务中使用功能磁共振成像(fMRI)和信心报告作为这种监测过程的一种度量来研究这一假设。采用一种新颖的计算模型,其中学习由基于信心的强化信号引导,我们发现中脑边缘脑区编码了信心的预期和预测误差——与先前基于外部奖励反馈的研究结果非常相似。我们证明该模型能够解释选择和信心报告,并表明通过该模型推导得出的中脑边缘信心预测误差调制能够预测个体的学习成功。这些结果通过用基于信心的反馈增强强化模型,为无外部反馈的学习提供了一种机械性的神经生物学解释。