Suppr超能文献

多巴胺和纹状体拮抗作用在学习和选择中的规范优势。

On the normative advantages of dopamine and striatal opponency for learning and choice.

机构信息

Department of Cognitive, Linguistic and Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, United States.

出版信息

Elife. 2023 Mar 22;12:e85107. doi: 10.7554/eLife.85107.

Abstract

The basal ganglia (BG) contribute to reinforcement learning (RL) and decision-making, but unlike artificial RL agents, it relies on complex circuitry and dynamic dopamine modulation of opponent striatal pathways to do so. We develop the OpAL* model to assess the normative advantages of this circuitry. In OpAL*, learning induces opponent pathways to differentially emphasize the history of positive or negative outcomes for each action. Dynamic DA modulation then amplifies the pathway most tuned for the task environment. This efficient coding mechanism avoids a vexing explore-exploit tradeoff that plagues traditional RL models in sparse reward environments. OpAL* exhibits robust advantages over alternative models, particularly in environments with sparse reward and large action spaces. These advantages depend on opponent and nonlinear Hebbian plasticity mechanisms previously thought to be pathological. Finally, OpAL* captures risky choice patterns arising from DA and environmental manipulations across species, suggesting that they result from a normative biological mechanism.

摘要

基底神经节 (BG) 有助于强化学习 (RL) 和决策,但与人工 RL 代理不同,它依赖于复杂的电路和多巴胺对对手纹状体通路的动态调制来实现这一点。我们开发了 OpAL* 模型来评估该电路的规范优势。在 OpAL* 中,学习会促使对手路径以不同的方式强调每个动作的积极或消极结果的历史。然后,动态 DA 调制会放大最适合任务环境的路径。这种有效的编码机制避免了困扰传统 RL 模型在稀疏奖励环境中的探索-利用权衡难题。OpAL* 相对于其他模型具有强大的优势,特别是在奖励稀疏且动作空间较大的环境中。这些优势取决于先前被认为是病理性的对手和非线性赫布氏可塑性机制。最后,OpAL* 捕获了来自跨物种的 DA 和环境操作产生的风险选择模式,表明它们是由规范的生物机制引起的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b8c/10198727/ed85e67c5253/elife-85107-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验