神经放电率操作性条件反射过程中奖励与无监督学习的共存

Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.

作者信息

Kerr Robert R, Grayden David B, Thomas Doreen A, Gilson Matthieu, Burkitt Anthony N

机构信息

NeuroEngineering Laboratory, Department of Electrical and Electronic Engineering, University of Melbourne, Melbourne, Australia ; Centre for Neural Engineering, University of Melbourne, Melbourne, Australia ; NICTA, Victoria Research Lab, University of Melbourne, Melbourne, Australia.

出版信息

PLoS One. 2014 Jan 27;9(1):e87123. doi: 10.1371/journal.pone.0087123. eCollection 2014.

DOI:10.1371/journal.pone.0087123

PMID:24475240

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3903641/

Abstract

A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments.

摘要

神经科学的一个基本目标是了解大脑如何执行诸如操作性条件反射等认知过程。操作性条件反射的典型且经过充分研究的例子，即通过奖励提高猴子单个皮层神经元的放电率，为深入了解这一过程提供了契机。对奖励调制的尖峰时间依赖性可塑性（RSTDP）以及其他模型（如R-max）的研究重现了这种学习行为，但它们假设不存在无监督学习（即没有奖励或独立于奖励时不发生学习）。我们表明，这些模型在展现奖励学习和持续稳定的无监督学习时，无法引发放电率增强。为解决这个问题，我们基于多巴胺对长时程增强和长时程抑制（LTP和LTD）的观察效应，提出了一种新的突触可塑性RSTDP模型。我们通过分析和模拟表明，我们的新模型能够展现无监督学习并导致放电率增强。这要求奖励信号对LTP的增强大于对LTD的增强，且被增强的神经元表现出不规则放电。我们展示了我们的研究结果对于尖峰时间相关性、假设的突触权重依赖性以及平均奖励变化的稳健性。我们还在两个相邻神经元的差异强化中考虑了我们的模型。与先前的模型相比，我们的模型与实验研究的契合度更高，并为未来的实验做出了可检验的预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d512/3903641/e7a9c806b4a3/pone.0087123.g001.jpg

相似文献

Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.

PLoS One. 2014 Jan 27;9(1):e87123. doi: 10.1371/journal.pone.0087123. eCollection 2014.

Functional requirements for reward-modulated spike-timing-dependent plasticity.

J Neurosci. 2010 Oct 6;30(40):13326-37. doi: 10.1523/JNEUROSCI.6249-09.2010.

Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.

Neural Comput. 2007 Jun;19(6):1468-502. doi: 10.1162/neco.2007.19.6.1468.

Neuron as a reward-modulated combinatorial switch and a model of learning behavior.

Neural Netw. 2013 Oct;46:62-74. doi: 10.1016/j.neunet.2013.04.010. Epub 2013 May 6.

Solving the distal reward problem with rare correlations.

Neural Comput. 2013 Apr;25(4):940-78. doi: 10.1162/NECO_a_00419. Epub 2013 Jan 22.

A model of operant learning based on chaotically varying synaptic strength.

Neural Netw. 2018 Dec;108:114-127. doi: 10.1016/j.neunet.2018.08.006. Epub 2018 Aug 11.

A spiking neural model for stable reinforcement of synapses based on multiple distal rewards.

Neural Comput. 2013 Jan;25(1):123-56. doi: 10.1162/NECO_a_00387. Epub 2012 Sep 28.

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

PLoS Comput Biol. 2013 Apr;9(4):e1003024. doi: 10.1371/journal.pcbi.1003024. Epub 2013 Apr 11.

Operant conditioning of neural activity in freely behaving monkeys with intracranial reinforcement.

J Neurophysiol. 2017 Mar 1;117(3):1112-1125. doi: 10.1152/jn.00423.2016. Epub 2016 Dec 28.

Learning complex temporal patterns with resource-dependent spike timing-dependent plasticity.

J Neurophysiol. 2012 Jul;108(2):551-66. doi: 10.1152/jn.01150.2011. Epub 2012 Apr 11.

引用本文的文献

Learning Pitch with STDP: A Computational Model of Place and Temporal Pitch Perception Using Spiking Neural Networks.

PLoS Comput Biol. 2016 Apr 6;12(4):e1004860. doi: 10.1371/journal.pcbi.1004860. eCollection 2016 Apr.

本文引用的文献

Delay selection by spike-timing-dependent plasticity in recurrent networks of spiking neurons receiving oscillatory inputs.

PLoS Comput Biol. 2013;9(2):e1002897. doi: 10.1371/journal.pcbi.1002897. Epub 2013 Feb 7.

Noradrenergic 'tone' determines dichotomous control of cortical spike-timing-dependent plasticity.

Sci Rep. 2012;2:417. doi: 10.1038/srep00417. Epub 2012 May 23.

Calcium-based plasticity model explains sensitivity of synaptic changes to spike pattern, rate, and dendritic location.

Proc Natl Acad Sci U S A. 2012 Mar 6;109(10):3991-6. doi: 10.1073/pnas.1109359109. Epub 2012 Feb 22.

Conditional modulation of spike-timing-dependent plasticity for olfactory learning.

Nature. 2012 Jan 25;482(7383):47-52. doi: 10.1038/nature10776.

Perceptual learning, roving and the unsupervised bias.

Vision Res. 2012 May 15;61:95-9. doi: 10.1016/j.visres.2011.11.001. Epub 2011 Nov 19.

Stability versus neuronal specialization for STDP: long-tail weight distributions solve the dilemma.

PLoS One. 2011;6(10):e25339. doi: 10.1371/journal.pone.0025339. Epub 2011 Oct 7.

Cholinergic modulation on spike timing-dependent plasticity in hippocampal CA1 network.

Neuroscience. 2011 Sep 29;192:91-101. doi: 10.1016/j.neuroscience.2011.06.064. Epub 2011 Jun 28.

Timing is not Everything: Neuromodulation Opens the STDP Gate.

Front Synaptic Neurosci. 2010 Oct 25;2:146. doi: 10.3389/fnsyn.2010.00146. eCollection 2010.

Does high firing irregularity enhance learning?

Neural Comput. 2011 Mar;23(3):656-63. doi: 10.1162/NECO_a_00090. Epub 2010 Dec 16.

Functional requirements for reward-modulated spike-timing-dependent plasticity.

J Neurosci. 2010 Oct 6;30(40):13326-37. doi: 10.1523/JNEUROSCI.6249-09.2010.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

神经放电率操作性条件反射过程中奖励与无监督学习的共存

Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献