Suppr超能文献

基于基底神经节和多巴胺反馈的显著性门控工作记忆、动作选择与强化的动力学模型。

Dynamical model of salience gated working memory, action selection and reinforcement based on basal ganglia and dopamine feedback.

作者信息

Ponzi Adam

机构信息

Laboratory for Dynamics of Emergent Intelligence, RIKEN Brain Science Institute, Wako, Saitama, Japan.

出版信息

Neural Netw. 2008 Mar-Apr;21(2-3):322-30. doi: 10.1016/j.neunet.2007.12.040. Epub 2007 Dec 31.

Abstract

A simple working memory model based on recurrent network activation is proposed and its application to selection and reinforcement of an action is demonstrated as a solution to the temporal credit assignment problem. Reactivation of recent salient cue states is generated and maintained as a type of salience gated recurrently active working memory, while lower salience distractors are ignored. Cue reactivation during the action selection period allows the cue to select an action while its reactivation at the reward period allows the reinforcement of the action selected by the reactivated state, which is necessarily the action which led to the reward being found. A down-gating of the external input during the reactivation and maintenance prevents interference. A double winner-take-all system which selects only one cue and only one action allows the targeting of the cue-action allocation to be modified. This targeting works both to reinforce a correct cue-action allocation and to punish the allocation when cue-action allocations change. Here we suggest a firing rate neural network implementation of this system based on the basal ganglia anatomy with input from a cortical association layer where reactivations are generated by signals from the thalamus. Striatum medium spiny neurons represent actions. Auto-catalytic feedback from a dopamine reward signal modulates three-way Hebbian long term potentiation and depression at the cortical-striatal synapses which represent the cue-action associations. The model is illustrated by the numerical simulations of a simple example--that of associating a cue signal to a correct action to obtain reward after a delay period, typical of primate cue reward tasks. Through learning, the model shows a transition from an exploratory phase where actions are generated randomly, to a stable directed phase where the animal always chooses the correct action for each experienced state. When cue-action allocations change, we show that this is noticed by the model, the incorrect cue-action allocations are punished and the correct ones discovered.

摘要

提出了一种基于循环网络激活的简单工作记忆模型,并展示了其在动作选择和强化中的应用,作为解决时间信用分配问题的一种方法。近期显著线索状态的重新激活被生成并维持为一种显著门控的循环激活工作记忆,而低显著性的干扰因素则被忽略。动作选择期间的线索重新激活允许线索选择一个动作,而奖励期间的重新激活则允许强化由重新激活状态选择的动作,该动作必然是导致获得奖励的动作。重新激活和维持期间外部输入的向下门控可防止干扰。一个仅选择一个线索和一个动作的双胜者全得系统允许修改线索-动作分配的目标。这种目标设定既有助于强化正确的线索-动作分配,也有助于在线索-动作分配发生变化时惩罚该分配。在此,我们基于基底神经节解剖结构提出了该系统的发放率神经网络实现,其输入来自皮层联合层,其中重新激活由来自丘脑的信号产生。纹状体中等棘状神经元代表动作。多巴胺奖励信号的自催化反馈调节皮层-纹状体突触处的三向赫布型长时程增强和抑制,这些突触代表线索-动作关联。通过一个简单示例的数值模拟来说明该模型——将线索信号与正确动作相关联,以便在延迟期后获得奖励,这是灵长类线索奖励任务的典型情况。通过学习,该模型显示出从随机生成动作的探索阶段到动物总是为每个经历状态选择正确动作的稳定定向阶段的转变。当线索-动作分配发生变化时,我们表明模型会注意到这一点,错误的线索-动作分配会受到惩罚,而正确的分配会被发现。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验