FitzGerald Thomas H B, Dolan Raymond J, Friston Karl
The Wellcome Trust Centre for Neuroimaging, University College London London, UK ; Max Planck - UCL Centre for Computational Psychiatry and Ageing Research London, UK.
The Wellcome Trust Centre for Neuroimaging, University College London London, UK.
Front Comput Neurosci. 2015 Nov 4;9:136. doi: 10.3389/fncom.2015.00136. eCollection 2015.
Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We offer a resolution to this paradox based on an hypothesis that dopamine encodes the precision of beliefs about alternative actions, and thus controls the outcome-sensitivity of behavior. We extend an active inference scheme for solving Markov decision processes to include learning, and show that simulated dopamine dynamics strongly resemble those actually observed during instrumental conditioning. Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states. Our formal approach provides a novel and parsimonious reconciliation of apparently divergent experimental findings.
时间差分学习模型提出,阶段性多巴胺信号编码驱动学习的奖励预测误差。这一观点得到了一些研究的支持,在这些研究中,对多巴胺神经元的光遗传学刺激可以替代实际奖励。然而,大量数据也表明,多巴胺对于学习并非必不可少,多巴胺耗竭主要影响任务表现。基于多巴胺编码关于替代行动信念的精确性这一假设,我们为这一悖论提供了一种解决方案,从而控制行为的结果敏感性。我们扩展了一种用于解决马尔可夫决策过程的主动推理方案以纳入学习,并表明模拟的多巴胺动态与在工具性条件作用期间实际观察到的动态非常相似。此外,模拟的多巴胺耗竭会损害表现,但不会影响学习,而模拟的多巴胺神经元兴奋通过对结果状态的异常推理驱动奖励学习。我们的形式化方法为明显不同的实验结果提供了一种新颖且简洁的调和方式。