Suppr超能文献

多巴胺瞬变独立于学习率编码奖励预测误差。

Dopamine transients encode reward prediction errors independent of learning rates.

机构信息

Center for Neural Science, New York University, New York, NY, USA.

Center for Neural Science, New York University, New York, NY, USA.

出版信息

Cell Rep. 2024 Oct 22;43(10):114840. doi: 10.1016/j.celrep.2024.114840. Epub 2024 Oct 11.

Abstract

Biological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented by corticostriatal synaptic weights, which are updated by dopamine-dependent plasticity. This suggests that dopamine release reflects the product of the learning rate and RPE. Here, we characterize dopamine encoding of learning rates in the nucleus accumbens core (NAcc) in a volatile environment. Using a task with semi-observable states offering different rewards, we find that rats adjust how quickly they initiate trials across states using RPEs. Computational modeling and behavioral analyses show that learning rates are higher following state transitions and scale with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the NAcc encodes RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.

摘要

强化学习的生物学解释假设多巴胺编码奖励预测误差(RPE),RPE 乘以学习率来更新状态或动作值。这些值被认为是由皮质纹状体突触权重表示的,而这些权重是由多巴胺依赖性可塑性更新的。这表明多巴胺释放反映了学习率和 RPE 的产物。在这里,我们在易变的环境中描述了伏隔核核心(NAcc)中学习率的多巴胺编码。使用具有提供不同奖励的半可观察状态的任务,我们发现大鼠使用 RPE 在状态之间调整启动试验的速度。计算模型和行为分析表明,在状态转换后学习率更高,并且与对隐藏状态的信念的逐试变化成正比,接近规范贝叶斯策略。值得注意的是,NAcc 中的多巴胺释放独立于学习率编码 RPE,这表明多巴胺独立的机制实现了动态学习率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ef6/11571066/50156759e793/nihms-2031357-f0002.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验