Suppr超能文献

神经元群体学习中的时空信用分配。

Spatio-temporal credit assignment in neuronal population learning.

机构信息

Department of Physiology, University of Bern, Bern, Switzerland.

出版信息

PLoS Comput Biol. 2011 Jun;7(6):e1002092. doi: 10.1371/journal.pcbi.1002092. Epub 2011 Jun 30.

Abstract

In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.

摘要

在试错学习中,动物需要将行为决策与环境强化联系起来,即使在结果不确定或受到延迟的情况下,要将信用分配给特定决策可能很困难。在考虑学习的生物物理基础时,信用分配问题更加复杂,因为行为决策本身是由许多突触释放的时空聚集产生的。我们提出了一种基于突触记忆痕迹级联的群体漏积分和放电神经元强化学习的可塑性诱导模型。每个突触级联首先将突触前输入与突触后事件相关联,其次与行为决策相关联,最后与外部强化相关联。对于操作性条件反射,即使在强化延迟非常大的情况下,学习也能成功,以至于由于中间决策本身受到延迟强化,决策和相关奖励之间的时间连续性丢失。这表明该模型为时间信用分配提供了可行的机制。此外,随着种群规模的增加,学习速度加快,因此可塑性级联同时解决了将信用分配给不同种群神经元中的突触的空间问题。在其他任务(如顺序决策)上的模拟,有助于将所提出的方案的性能与基于时间差分的学习的性能进行对比。我们认为,由于其相对稳健性,突触可塑性级联是大脑中强化学习的有吸引力的基本模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/508b/3127803/df376e8ce2e1/pcbi.1002092.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验