正性和负性结果的差异学习率的适应性特性。

Adaptive properties of differential learning rates for positive and negative outcomes.

作者信息

Cazé Romain D, van der Meer Matthijs A A

机构信息

Department of Bioengineering, Imperial College, London, UK,

出版信息

Biol Cybern. 2013 Dec;107(6):711-9. doi: 10.1007/s00422-013-0571-5. Epub 2013 Oct 2.

DOI:10.1007/s00422-013-0571-5

PMID:24085507

Abstract

The concept of the reward prediction error-the difference between reward obtained and reward predicted-continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static "bandit" choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data.

摘要

奖励预测误差的概念——即获得的奖励与预测的奖励之间的差异——仍然是心理学、认知科学和神经科学中许多理论和实验工作的焦点。依赖奖励预测误差的模型通常对正向和负向预测误差采用单一学习率。然而，行为数据表明，优于预期和差于预期的结果对学习和决策的影响往往并不对称。此外，皮质-纹状体回路中的不同神经回路似乎分别支持从正向和负向预测误差中学习。这种不同的学习率预计会导致奖励预测出现偏差，从而导致选择性能次优。与这种直觉相反，我们表明在静态“老虎机”选择任务中，不同的学习率可以是适应性的。之所以会这样，是因为非对称学习能够更好地区分所学的奖励概率。我们通过分析展示了最优学习率不对称性如何取决于奖励分布，并实现了一种生物学上合理的算法，该算法根据经验调整正向和负向学习率的平衡。这些结果表明在简单强化学习环境中，不同的学习率具有特定的适应性优势，并为相关神经数据的解释提供了一种新颖的规范性视角。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

正性和负性结果的差异学习率的适应性特性。

Adaptive properties of differential learning rates for positive and negative outcomes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

正性和负性结果的差异学习率的适应性特性。

Adaptive properties of differential learning rates for positive and negative outcomes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献