Suppr超能文献

在强化学习任务中对奖励预测误差的主成分分析。

Principal components analysis of reward prediction errors in a reinforcement learning task.

机构信息

Cognition Institute, Department of Psychology, Plymouth University, Plymouth PL4 8AA, UK.

Cognition Institute, Department of Psychology, Plymouth University, Plymouth PL4 8AA, UK.

出版信息

Neuroimage. 2016 Jan 1;124(Pt A):276-286. doi: 10.1016/j.neuroimage.2015.07.032. Epub 2015 Jul 18.

Abstract

Models of reinforcement learning represent reward and punishment in terms of reward prediction errors (RPEs), quantitative signed terms describing the degree to which outcomes are better than expected (positive RPEs) or worse (negative RPEs). An electrophysiological component known as feedback related negativity (FRN) occurs at frontocentral sites 240-340ms after feedback on whether a reward or punishment is obtained, and has been claimed to neurally encode an RPE. An outstanding question however, is whether the FRN is sensitive to the size of both positive RPEs and negative RPEs. Previous attempts to answer this question have examined the simple effects of RPE size for positive RPEs and negative RPEs separately. However, this methodology can be compromised by overlap from components coding for unsigned prediction error size, or "salience", which are sensitive to the absolute size of a prediction error but not its valence. In our study, positive and negative RPEs were parametrically modulated using both reward likelihood and magnitude, with principal components analysis used to separate out overlying components. This revealed a single RPE encoding component responsive to the size of positive RPEs, peaking at ~330ms, and occupying the delta frequency band. Other components responsive to unsigned prediction error size were shown, but no component sensitive to negative RPE size was found.

摘要

强化学习模型用奖励预测误差(RPE)来表示奖励和惩罚,这是一个定量的有符号术语,用于描述结果比预期好(正 RPE)或差(负 RPE)的程度。一种被称为反馈相关负波(FRN)的电生理成分在前额中央部位出现,时间在反馈是否获得奖励或惩罚之后 240-340 毫秒,据称它可以对 RPE 进行神经编码。然而,一个悬而未决的问题是,FRN 是否对正 RPE 和负 RPE 的大小都敏感。以前试图回答这个问题的尝试分别检查了正 RPE 和负 RPE 的 RPE 大小的简单效应。然而,这种方法可能会受到编码无符号预测误差大小(或“显着性”)的组件的重叠影响,这些组件对预测误差的绝对大小敏感,但对其效价不敏感。在我们的研究中,使用奖励可能性和幅度对正 RPE 和负 RPE 进行参数调制,使用主成分分析来分离重叠的组件。这揭示了一个对正 RPE 大小敏感的单一 RPE 编码组件,峰值约为 330ms,并占据了 delta 频带。还显示了对无符号预测误差大小敏感的其他组件,但没有发现对负 RPE 大小敏感的组件。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验