神经元作为一种受奖励调节的组合开关和学习行为的模型。

Neuron as a reward-modulated combinatorial switch and a model of learning behavior.

出版信息

Neural Netw. 2013 Oct;46:62-74. doi: 10.1016/j.neunet.2013.04.010. Epub 2013 May 6.

DOI:10.1016/j.neunet.2013.04.010

Abstract

This paper proposes a neuronal circuitry layout and synaptic plasticity principles that allow the (pyramidal) neuron to act as a "combinatorial switch". Namely, the neuron learns to be more prone to generate spikes given those combinations of firing input neurons for which a previous spiking of the neuron had been followed by a positive global reward signal. The reward signal may be mediated by certain modulatory hormones or neurotransmitters, e.g., the dopamine. More generally, a trial-and-error learning paradigm is suggested in which a global reward signal triggers long-term enhancement or weakening of a neuron's spiking response to the preceding neuronal input firing pattern. Thus, rewards provide a feedback pathway that informs neurons whether their spiking was beneficial or detrimental for a particular input combination. The neuron's ability to discern specific combinations of firing input neurons is achieved through a random or predetermined spatial distribution of input synapses on dendrites that creates synaptic clusters that represent various permutations of input neurons. The corresponding dendritic segments, or the enclosed individual spines, are capable of being particularly excited, due to local sigmoidal thresholding involving voltage-gated channel conductances, if the segment's excitatory and absence of inhibitory inputs are temporally coincident. Such nonlinear excitation corresponds to a particular firing combination of input neurons, and it is posited that the excitation strength encodes the combinatorial memory and is regulated by long-term plasticity mechanisms. It is also suggested that the spine calcium influx that may result from the spatiotemporal synaptic input coincidence may cause the spine head actin filaments to undergo mechanical (muscle-like) contraction, with the ensuing cytoskeletal deformation transmitted to the axon initial segment where it may modulate the global neuron firing threshold. The tasks of pattern classification and generalization are discussed within the presented framework.

摘要

本文提出了一种神经元电路布局和突触可塑性原理，使（锥）体神经元能够充当“组合开关”。也就是说，神经元学会了在先前神经元的一次放电后，跟随一个正的全局奖励信号，更容易产生对特定组合的放电输入神经元的尖峰。奖励信号可能由某些调节激素或神经递质介导，例如多巴胺。更一般地，提出了一种试错学习范例，其中全局奖励信号触发神经元对先前神经元输入放电模式的放电反应的长期增强或减弱。因此，奖励提供了一种反馈途径，告知神经元它们的放电对于特定输入组合是有益还是有害。神经元区分特定的输入神经元组合的能力是通过在树突上随机或预定的输入突触的空间分布来实现的，这种分布创建了代表输入神经元各种排列的突触簇。由于涉及电压门控通道电导的局部 sigmoidal 阈值，相应的树突段或包含的单个棘突，如果其兴奋性输入和抑制性输入不存在时间上的巧合，则能够被特别地激发。这种非线性激发对应于输入神经元的特定放电组合，并且假设激发强度编码组合记忆，并受长期可塑性机制的调节。还提出，可能由时空突触输入巧合引起的棘突钙内流可能导致棘突头部肌动蛋白丝发生机械（肌肉样）收缩，随之而来的细胞骨架变形传递到轴突起始段，从而调节全局神经元放电阈值。在提出的框架内讨论了模式分类和泛化的任务。

相似文献

Neuron as a reward-modulated combinatorial switch and a model of learning behavior.神经元作为一种受奖励调节的组合开关和学习行为的模型。

Neural Netw. 2013 Oct;46:62-74. doi: 10.1016/j.neunet.2013.04.010. Epub 2013 May 6.

Learning by the dendritic prediction of somatic spiking.通过树突预测躯体发放进行学习。

Neuron. 2014 Feb 5;81(3):521-8. doi: 10.1016/j.neuron.2013.11.030.

Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.通过调节尖峰时间依赖性突触可塑性进行强化学习。

Neural Comput. 2007 Jun;19(6):1468-502. doi: 10.1162/neco.2007.19.6.1468.

Solving the distal reward problem with rare correlations.利用罕见相关性解决远端奖励问题。

Neural Comput. 2013 Apr;25(4):940-78. doi: 10.1162/NECO_a_00419. Epub 2013 Jan 22.

Inhibitory synaptic plasticity regulates pyramidal neuron spiking in the rodent hippocampus.抑制性突触可塑性调节啮齿动物海马体中的锥体神经元放电。

Neuroscience. 2008 Jul 31;155(1):64-75. doi: 10.1016/j.neuroscience.2008.05.009. Epub 2008 May 21.

Integration of synchronous synaptic input in CA1 pyramidal neuron depends on spatial and temporal distributions of the input.CA1 锥体神经元中同步突触输入的整合取决于输入的时空分布。

Hippocampus. 2013 Jan;23(1):87-99. doi: 10.1002/hipo.22061. Epub 2012 Sep 21.

Emergence of network structure due to spike-timing-dependent plasticity in recurrent neuronal networks. II. Input selectivity--symmetry breaking.由于递归神经元网络中尖峰时间依赖性可塑性导致的网络结构出现。II. 输入选择性——对称性破缺。

Biol Cybern. 2009 Aug;101(2):103-14. doi: 10.1007/s00422-009-0320-y. Epub 2009 Jun 18.

Neurons tune to the earliest spikes through STDP.神经元通过突触可塑性依赖的突触时程可塑性（STDP）来调谐到最早的尖峰。

Neural Comput. 2005 Apr;17(4):859-79. doi: 10.1162/0899766053429390.

A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.一种用于奖励调制的依赖于尖峰时间的可塑性的学习理论及其在生物反馈中的应用。

PLoS Comput Biol. 2008 Oct;4(10):e1000180. doi: 10.1371/journal.pcbi.1000180. Epub 2008 Oct 10.

A spiking neural network model of an actor-critic learning agent.一种基于演员-评论家学习智能体的脉冲神经网络模型。

Neural Comput. 2009 Feb;21(2):301-39. doi: 10.1162/neco.2008.08-07-593.

引用本文的文献

An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction.大脑皮层的运作原理，以及注意试错模式学习和有用分类提取的细胞机制。

Front Neural Circuits. 2024 Mar 5;18:1280604. doi: 10.3389/fncir.2024.1280604. eCollection 2024.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

神经元作为一种受奖励调节的组合开关和学习行为的模型。

Neuron as a reward-modulated combinatorial switch and a model of learning behavior.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献