Suppr超能文献

具有动机的神经网络。

Neural Networks With Motivation.

作者信息

Shuvaev Sergey A, Tran Ngoc B, Stephenson-Jones Marcus, Li Bo, Koulakov Alexei A

机构信息

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States.

Sainsbury Wellcome Centre, University College London, London, United Kingdom.

出版信息

Front Syst Neurosci. 2021 Jan 11;14:609316. doi: 10.3389/fnsys.2020.609316. eCollection 2020.

Abstract

Animals rely on internal motivational states to make decisions. The role of motivational salience in decision making is in early stages of mathematical understanding. Here, we propose a reinforcement learning framework that relies on neural networks to learn optimal ongoing behavior for dynamically changing motivation values. First, we show that neural networks implementing Q-learning with motivational salience can navigate in environment with dynamic rewards without adjustments in synaptic strengths when the needs of an agent shift. In this setting, our networks may display elements of addictive behaviors. Second, we use a similar framework in hierarchical manager-agent system to implement a reinforcement learning algorithm with motivation that both infers motivational states and behaves. Finally, we show that, when trained in the Pavlovian conditioning setting, the responses of the neurons in our model resemble previously published neuronal recordings in the ventral pallidum, a basal ganglia structure involved in motivated behaviors. We conclude that motivation allows Q-learning networks to quickly adapt their behavior to conditions when expected reward is modulated by agent's dynamic needs. Our approach addresses the algorithmic rationale of motivation and makes a step toward better interpretability of behavioral data via inference of motivational dynamics in the brain.

摘要

动物依靠内部动机状态来做出决策。动机显著性在决策中的作用正处于数学理解的早期阶段。在此,我们提出一个强化学习框架,该框架依赖神经网络为动态变化的动机值学习最优的持续行为。首先,我们表明,当智能体的需求发生变化时,实现带有动机显著性的Q学习的神经网络能够在具有动态奖励的环境中导航,而无需调整突触强度。在这种情况下,我们的网络可能会表现出成瘾行为的特征。其次,我们在分层管理者-智能体系统中使用类似的框架来实现一种带有动机的强化学习算法,该算法既能推断动机状态又能表现出相应行为。最后,我们表明,当在巴甫洛夫条件反射环境中进行训练时,我们模型中神经元的反应类似于先前发表的关于腹侧苍白球神经元记录的结果,腹侧苍白球是参与动机行为的基底神经节结构。我们得出结论,当预期奖励由智能体的动态需求调节时,动机使Q学习网络能够快速调整其行为以适应环境。我们的方法解决了动机的算法原理问题,并朝着通过推断大脑中的动机动态来更好地解释行为数据迈出了一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6e0/7848953/d33264456727/fnsys-14-609316-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验