Gershman Samuel J, Lak Armin
Department of Psychology and Center for Brain Science, Harvard University.
Department of Physiology, Anatomy and Genetics, University of Oxford.
bioRxiv. 2024 Sep 16:2024.09.15.613150. doi: 10.1101/2024.09.15.613150.
Limits on information processing capacity impose limits on task performance. We show that animals achieve performance on a perceptual decision task that is near-optimal given their capacity limits, as measured by policy complexity (the mutual information between states and actions). This behavioral profile could be achieved by reinforcement learning with a penalty on high complexity policies, realized through modulation of dopaminergic learning signals. In support of this hypothesis, we find that policy complexity suppresses midbrain dopamine responses to reward outcomes, thereby reducing behavioral sensitivity to these outcomes. Our results suggest that policy compression shapes basic mechanisms of reinforcement learning in the brain.
信息处理能力的限制会对任务表现施加限制。我们表明,动物在感知决策任务上的表现接近其能力限制下的最优水平,这一能力限制通过策略复杂性(状态与动作之间的互信息)来衡量。这种行为表现可以通过对高复杂性策略施加惩罚的强化学习来实现,这种惩罚通过多巴胺能学习信号的调制来实现。为支持这一假设,我们发现策略复杂性会抑制中脑多巴胺对奖励结果的反应,从而降低行为对这些结果的敏感性。我们的结果表明,策略压缩塑造了大脑中强化学习的基本机制。