Suppr超能文献

基于奖励的决策过程中预测误差处理的时间动态。

Temporal dynamics of prediction error processing during reward-based decision making.

机构信息

Max Planck Institute for Human Development, Berlin, 14195, Germany; Max Planck Institute for Human Cognitive & Brain Sciences, Leipzig, 04303, Germany.

出版信息

Neuroimage. 2010 Oct 15;53(1):221-32. doi: 10.1016/j.neuroimage.2010.05.052. Epub 2010 May 25.

Abstract

Adaptive decision making depends on the accurate representation of rewards associated with potential choices. These representations can be acquired with reinforcement learning (RL) mechanisms, which use the prediction error (PE, the difference between expected and received rewards) as a learning signal to update reward expectations. While EEG experiments have highlighted the role of feedback-related potentials during performance monitoring, important questions about the temporal sequence of feedback processing and the specific function of feedback-related potentials during reward-based decision making remain. Here, we hypothesized that feedback processing starts with a qualitative evaluation of outcome-valence, which is subsequently complemented by a quantitative representation of PE magnitude. Results of a model-based single-trial analysis of EEG data collected during a reversal learning task showed that around 220ms after feedback outcomes are initially evaluated categorically with respect to their valence (positive vs. negative). Around 300ms, and parallel to the maintained valence-evaluation, the brain also represents quantitative information about PE magnitude, thus providing the complete information needed to update reward expectations and to guide adaptive decision making. Importantly, our single-trial EEG analysis based on PEs from an RL model showed that the feedback-related potentials do not merely reflect error awareness, but rather quantitative information crucial for learning reward contingencies.

摘要

自适应决策取决于对潜在选择相关奖励的准确表示。这些表示可以通过强化学习(RL)机制获得,该机制使用预测误差(PE,期望和收到的奖励之间的差异)作为学习信号来更新奖励预期。虽然 EEG 实验强调了在表现监测期间反馈相关电位的作用,但关于反馈处理的时间序列以及在基于奖励的决策过程中反馈相关电位的具体功能的重要问题仍然存在。在这里,我们假设反馈处理始于对结果效价的定性评估,随后通过 PE 幅度的定量表示来补充。在一项反转学习任务中收集的 EEG 数据的基于模型的单次试验分析的结果表明,在反馈结果最初根据其效价(正性与负性)进行分类评估后约 220ms。大约 300ms,与维持的效价评估平行,大脑还表示有关 PE 幅度的定量信息,从而提供更新奖励预期和指导自适应决策所需的完整信息。重要的是,我们基于 RL 模型的 PEs 的单次试验 EEG 分析表明,反馈相关电位不仅反映了错误意识,而且反映了学习奖励关联的关键定量信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验