Suppr超能文献

具有动机的神经网络。

Neural Networks With Motivation.

作者信息

Shuvaev Sergey A, Tran Ngoc B, Stephenson-Jones Marcus, Li Bo, Koulakov Alexei A

机构信息

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States.

Sainsbury Wellcome Centre, University College London, London, United Kingdom.

出版信息

Front Syst Neurosci. 2021 Jan 11;14:609316. doi: 10.3389/fnsys.2020.609316. eCollection 2020.

Abstract

Animals rely on internal motivational states to make decisions. The role of motivational salience in decision making is in early stages of mathematical understanding. Here, we propose a reinforcement learning framework that relies on neural networks to learn optimal ongoing behavior for dynamically changing motivation values. First, we show that neural networks implementing Q-learning with motivational salience can navigate in environment with dynamic rewards without adjustments in synaptic strengths when the needs of an agent shift. In this setting, our networks may display elements of addictive behaviors. Second, we use a similar framework in hierarchical manager-agent system to implement a reinforcement learning algorithm with motivation that both infers motivational states and behaves. Finally, we show that, when trained in the Pavlovian conditioning setting, the responses of the neurons in our model resemble previously published neuronal recordings in the ventral pallidum, a basal ganglia structure involved in motivated behaviors. We conclude that motivation allows Q-learning networks to quickly adapt their behavior to conditions when expected reward is modulated by agent's dynamic needs. Our approach addresses the algorithmic rationale of motivation and makes a step toward better interpretability of behavioral data via inference of motivational dynamics in the brain.

摘要

动物依靠内部动机状态来做出决策。动机显著性在决策中的作用正处于数学理解的早期阶段。在此,我们提出一个强化学习框架,该框架依赖神经网络为动态变化的动机值学习最优的持续行为。首先,我们表明,当智能体的需求发生变化时,实现带有动机显著性的Q学习的神经网络能够在具有动态奖励的环境中导航,而无需调整突触强度。在这种情况下,我们的网络可能会表现出成瘾行为的特征。其次,我们在分层管理者-智能体系统中使用类似的框架来实现一种带有动机的强化学习算法,该算法既能推断动机状态又能表现出相应行为。最后,我们表明,当在巴甫洛夫条件反射环境中进行训练时,我们模型中神经元的反应类似于先前发表的关于腹侧苍白球神经元记录的结果,腹侧苍白球是参与动机行为的基底神经节结构。我们得出结论,当预期奖励由智能体的动态需求调节时,动机使Q学习网络能够快速调整其行为以适应环境。我们的方法解决了动机的算法原理问题,并朝着通过推断大脑中的动机动态来更好地解释行为数据迈出了一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6e0/7848953/d33264456727/fnsys-14-609316-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验