具有动机的神经网络。

Neural Networks With Motivation.

作者信息

Shuvaev Sergey A, Tran Ngoc B, Stephenson-Jones Marcus, Li Bo, Koulakov Alexei A

机构信息

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States.

Sainsbury Wellcome Centre, University College London, London, United Kingdom.

出版信息

Front Syst Neurosci. 2021 Jan 11;14:609316. doi: 10.3389/fnsys.2020.609316. eCollection 2020.

DOI:10.3389/fnsys.2020.609316

PMID:33536879

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7848953/

Abstract

Animals rely on internal motivational states to make decisions. The role of motivational salience in decision making is in early stages of mathematical understanding. Here, we propose a reinforcement learning framework that relies on neural networks to learn optimal ongoing behavior for dynamically changing motivation values. First, we show that neural networks implementing Q-learning with motivational salience can navigate in environment with dynamic rewards without adjustments in synaptic strengths when the needs of an agent shift. In this setting, our networks may display elements of addictive behaviors. Second, we use a similar framework in hierarchical manager-agent system to implement a reinforcement learning algorithm with motivation that both infers motivational states and behaves. Finally, we show that, when trained in the Pavlovian conditioning setting, the responses of the neurons in our model resemble previously published neuronal recordings in the ventral pallidum, a basal ganglia structure involved in motivated behaviors. We conclude that motivation allows Q-learning networks to quickly adapt their behavior to conditions when expected reward is modulated by agent's dynamic needs. Our approach addresses the algorithmic rationale of motivation and makes a step toward better interpretability of behavioral data via inference of motivational dynamics in the brain.

摘要

动物依靠内部动机状态来做出决策。动机显著性在决策中的作用正处于数学理解的早期阶段。在此，我们提出一个强化学习框架，该框架依赖神经网络为动态变化的动机值学习最优的持续行为。首先，我们表明，当智能体的需求发生变化时，实现带有动机显著性的Q学习的神经网络能够在具有动态奖励的环境中导航，而无需调整突触强度。在这种情况下，我们的网络可能会表现出成瘾行为的特征。其次，我们在分层管理者-智能体系统中使用类似的框架来实现一种带有动机的强化学习算法，该算法既能推断动机状态又能表现出相应行为。最后，我们表明，当在巴甫洛夫条件反射环境中进行训练时，我们模型中神经元的反应类似于先前发表的关于腹侧苍白球神经元记录的结果，腹侧苍白球是参与动机行为的基底神经节结构。我们得出结论，当预期奖励由智能体的动态需求调节时，动机使Q学习网络能够快速调整其行为以适应环境。我们的方法解决了动机的算法原理问题，并朝着通过推断大脑中的动机动态来更好地解释行为数据迈出了一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6e0/7848953/d33264456727/fnsys-14-609316-g001.jpg

相似文献

Neural Networks With Motivation.具有动机的神经网络。

Front Syst Neurosci. 2021 Jan 11;14:609316. doi: 10.3389/fnsys.2020.609316. eCollection 2020.

A neural computational model of incentive salience.一种动机显著性的神经计算模型。

PLoS Comput Biol. 2009 Jul;5(7):e1000437. doi: 10.1371/journal.pcbi.1000437. Epub 2009 Jul 17.

Reward prediction-errors weighted by cue salience produces addictive behaviours in simulations, with asymmetrical learning and steeper delay discounting.在模拟中，根据线索显著度对奖励预测误差进行加权会产生成瘾行为，表现为不对称学习和更陡峭的延迟折扣。

Neural Netw. 2023 Nov;168:631-650. doi: 10.1016/j.neunet.2023.09.032. Epub 2023 Sep 25.

Neural Activity in the Ventral Pallidum Encodes Variation in the Incentive Value of a Reward Cue.腹侧苍白球中的神经活动编码奖励线索激励价值的变化。

J Neurosci. 2016 Jul 27;36(30):7957-70. doi: 10.1523/JNEUROSCI.0736-16.2016.

Dorsal Raphe Dopamine Neurons Signal Motivational Salience Dependent on Internal State, Expectation, and Behavioral Context.背缝核多巴胺神经元根据内部状态、预期和行为背景信号传递动机显著性。

J Neurosci. 2021 Mar 24;41(12):2645-2655. doi: 10.1523/JNEUROSCI.2690-20.2021. Epub 2021 Feb 9.

Modeling the effects of motivation on choice and learning in the basal ganglia.在基底神经节中建模动机对选择和学习的影响。

PLoS Comput Biol. 2020 May 26;16(5):e1007465. doi: 10.1371/journal.pcbi.1007465. eCollection 2020 May.

Neural activity ramps in frontal cortex signal extended motivation during learning.前额皮质中的神经活动逐渐增强，表明学习过程中存在持续的动机。

Elife. 2024 Jul 22;13:RP93983. doi: 10.7554/eLife.93983.

Modeling incentive salience in Pavlovian learning more parsimoniously using a multiple attribute model.更简洁地使用多属性模型对巴甫洛夫学习中的激励显著性进行建模。

Cogn Affect Behav Neurosci. 2022 Apr;22(2):244-257. doi: 10.3758/s13415-021-00953-2. Epub 2021 Oct 21.

Reward-Mediated, Model-Free Reinforcement-Learning Mechanisms in Pavlovian and Instrumental Tasks Are Related.奖赏介导的、无模型的强化学习机制在条件反射和工具性任务中是相关的。

J Neurosci. 2023 Jan 18;43(3):458-471. doi: 10.1523/JNEUROSCI.1113-22.2022. Epub 2022 Oct 10.

Neurons of the Ventral Tegmental Area Encode Individual Differences in Motivational "Wanting" for Reward Cues.腹侧被盖区神经元编码动机性“渴望”奖赏线索的个体差异。

J Neurosci. 2020 Nov 11;40(46):8951-8963. doi: 10.1523/JNEUROSCI.2947-19.2020. Epub 2020 Oct 12.

引用本文的文献

A genetically defined insula-brainstem circuit selectively controls motivational vigor.一个由基因定义的岛脑回路选择性地控制动机活力。

Cell. 2021 Dec 22;184(26):6344-6360.e18. doi: 10.1016/j.cell.2021.11.019. Epub 2021 Dec 9.

本文引用的文献

Artificial Development by Reinforcement Learning Can Benefit From Multiple Motivations.通过强化学习进行的人工开发可以从多种动机中受益。

Front Robot AI. 2019 Feb 14;6:6. doi: 10.3389/frobt.2019.00006. eCollection 2019.

Opposing Contributions of GABAergic and Glutamatergic Ventral Pallidal Neurons to Motivational Behaviors.腹侧苍白球 GABA 能和谷氨酸能神经元对动机行为的相反作用。

Neuron. 2020 Mar 4;105(5):921-933.e5. doi: 10.1016/j.neuron.2019.12.006. Epub 2020 Jan 13.

Comparative study of chemical neuroanatomy of the olfactory neuropil in mouse, honey bee, and human.小鼠、蜜蜂和人类嗅觉神经纤维化学神经解剖学的比较研究。

Biol Cybern. 2018 Apr;112(1-2):127-140. doi: 10.1007/s00422-017-0728-8. Epub 2017 Aug 29.

Liking, wanting, and the incentive-sensitization theory of addiction.喜好、欲求与成瘾的动机敏感化理论

Am Psychol. 2016 Nov;71(8):670-679. doi: 10.1037/amp0000059.

A basal ganglia circuit for evaluating action outcomes.一个用于评估动作结果的基底神经节回路。

Nature. 2016 Nov 10;539(7628):289-293. doi: 10.1038/nature19845. Epub 2016 Sep 21.

The role of the human globus pallidus in Huntington's disease.人类苍白球在亨廷顿舞蹈病中的作用。

Brain Pathol. 2016 Nov;26(6):741-751. doi: 10.1111/bpa.12429.

Neuronal activity in dorsomedial and dorsolateral striatum under the requirement for temporal credit assignment.在时间信用分配需求下背内侧和背外侧纹状体中的神经元活动。

Sci Rep. 2016 Jun 1;6:27056. doi: 10.1038/srep27056.

Ventral Pallidum Neurons Encode Incentive Value and Promote Cue-Elicited Instrumental Actions.腹侧苍白球神经元编码动机价值并促进线索引发的工具性动作。

Neuron. 2016 Jun 15;90(6):1165-1173. doi: 10.1016/j.neuron.2016.04.037. Epub 2016 May 26.

Homeostatic reinforcement learning for integrating reward collection and physiological stability.用于整合奖励收集和生理稳定性的稳态强化学习。

Elife. 2014 Dec 2;3:e04811. doi: 10.7554/eLife.04811.

Reward for food odors: an fMRI study of liking and wanting as a function of metabolic state and BMI.食物气味的奖赏：一项关于喜好和渴望作为代谢状态及体重指数函数的功能磁共振成像研究

Soc Cogn Affect Neurosci. 2015 Apr;10(4):561-8. doi: 10.1093/scan/nsu086. Epub 2014 Jun 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

具有动机的神经网络。

Neural Networks With Motivation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献