Bays Paul M, Dowding Ben A
University of Cambridge, Department of Psychology, Cambridge, United Kingdom.
PLoS Comput Biol. 2017 Mar 1;13(3):e1005405. doi: 10.1371/journal.pcbi.1005405. eCollection 2017 Mar.
The ability to make optimal decisions depends on evaluating the expected rewards associated with different potential actions. This process is critically dependent on the fidelity with which reward value information can be maintained in the nervous system. Here we directly probe the fidelity of value representation following a standard reinforcement learning task. The results demonstrate a previously-unrecognized bias in the representation of value: extreme reward values, both low and high, are stored significantly more accurately and precisely than intermediate rewards. The symmetry between low and high rewards pertained despite substantially higher frequency of exposure to high rewards, resulting from preferential exploitation of more rewarding options. The observed variation in fidelity of value representation retrospectively predicted performance on the reinforcement learning task, demonstrating that the bias in representation has an impact on decision-making. A second experiment in which one or other extreme-valued option was omitted from the learning sequence showed that representational fidelity is primarily determined by the relative position of an encoded value on the scale of rewards experienced during learning. Both variability and guessing decreased with the reduction in the number of options, consistent with allocation of a limited representational resource. These findings have implications for existing models of reward-based learning, which typically assume defectless representation of reward value.
做出最优决策的能力取决于对与不同潜在行动相关的预期奖励进行评估。这个过程严重依赖于奖励价值信息在神经系统中得以维持的保真度。在此,我们通过一项标准强化学习任务直接探究价值表征的保真度。结果表明在价值表征中存在一种此前未被认识到的偏差:极低和极高的极端奖励值比中等奖励值存储得显著更准确、更精确。尽管由于对更具奖励性的选项进行优先利用,高奖励出现的频率大幅更高,但低奖励和高奖励之间仍存在这种对称性。所观察到的价值表征保真度变化可回顾性地预测强化学习任务中的表现,表明表征偏差对决策有影响。在第二个实验中,从学习序列中省略一个或另一个极端值选项,结果表明表征保真度主要由编码值在学习期间所经历的奖励尺度上的相对位置决定。随着选项数量的减少,变异性和猜测都降低了,这与有限表征资源的分配一致。这些发现对现有的基于奖励的学习模型有影响,这些模型通常假定奖励价值的表征完美无缺。