基于调制的尖峰时间依赖突触可塑性的强化学习。

Reinforcement learning with modulated spike timing dependent synaptic plasticity.

作者信息

Farries Michael A, Fairhall Adrienne L

机构信息

Department of Biology, University of Texas at San Antonio, San Antonio, TX 78249, USA.

出版信息

J Neurophysiol. 2007 Dec;98(6):3648-65. doi: 10.1152/jn.00364.2007. Epub 2007 Oct 10.

DOI:10.1152/jn.00364.2007

PMID:17928565

Abstract

Spike timing-dependent synaptic plasticity (STDP) has emerged as the preferred framework linking patterns of pre- and postsynaptic activity to changes in synaptic strength. Although synaptic plasticity is widely believed to be a major component of learning, it is unclear how STDP itself could serve as a mechanism for general purpose learning. On the other hand, algorithms for reinforcement learning work on a wide variety of problems, but lack an experimentally established neural implementation. Here, we combine these paradigms in a novel model in which a modified version of STDP achieves reinforcement learning. We build this model in stages, identifying a minimal set of conditions needed to make it work. Using a performance-modulated modification of STDP in a two-layer feedforward network, we can train output neurons to generate arbitrarily selected spike trains or population responses. Furthermore, a given network can learn distinct responses to several different input patterns. We also describe in detail how this model might be implemented biologically. Thus our model offers a novel and biologically plausible implementation of reinforcement learning that is capable of training a neural population to produce a very wide range of possible mappings between synaptic input and spiking output.

摘要

尖峰时间依赖性突触可塑性（STDP）已成为将突触前和突触后活动模式与突触强度变化联系起来的首选框架。尽管人们普遍认为突触可塑性是学习的主要组成部分，但尚不清楚STDP本身如何作为通用学习的机制。另一方面，强化学习算法可解决各种各样的问题，但缺乏实验确定的神经实现方式。在此，我们在一个新颖的模型中将这些范式结合起来，其中STDP的一个修改版本实现了强化学习。我们分阶段构建这个模型，确定使其工作所需的一组最小条件。在一个两层前馈网络中使用性能调制的STDP修改版本，我们可以训练输出神经元生成任意选择的尖峰序列或群体反应。此外，给定的网络可以学习对几种不同输入模式的不同反应。我们还详细描述了该模型在生物学上可能如何实现。因此，我们的模型提供了一种新颖且生物学上合理的强化学习实现方式，能够训练神经群体在突触输入和尖峰输出之间产生非常广泛的可能映射。

相似文献

Reinforcement learning with modulated spike timing dependent synaptic plasticity.

J Neurophysiol. 2007 Dec;98(6):3648-65. doi: 10.1152/jn.00364.2007. Epub 2007 Oct 10.

Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.

Neural Comput. 2007 Jun;19(6):1468-502. doi: 10.1162/neco.2007.19.6.1468.

What can a neuron learn with spike-timing-dependent plasticity?

Neural Comput. 2005 Nov;17(11):2337-82. doi: 10.1162/0899766054796888.

Reconciling the STDP and BCM models of synaptic plasticity in a spiking recurrent neural network.

Neural Comput. 2010 Aug;22(8):2059-85. doi: 10.1162/NECO_a_00003-Bush.

An implementation of reinforcement learning based on spike timing dependent plasticity.

Biol Cybern. 2008 Dec;99(6):517-23. doi: 10.1007/s00422-008-0265-6. Epub 2008 Oct 22.

Competitive Hebbian learning through spike-timing-dependent synaptic plasticity.

Nat Neurosci. 2000 Sep;3(9):919-26. doi: 10.1038/78829.

A computational framework for cortical learning.

Biol Cybern. 2004 Jun;90(6):400-9. doi: 10.1007/s00422-004-0487-1. Epub 2004 Jul 22.

Emergence of network structure due to spike-timing-dependent plasticity in recurrent neuronal networks. II. Input selectivity--symmetry breaking.

Biol Cybern. 2009 Aug;101(2):103-14. doi: 10.1007/s00422-009-0320-y. Epub 2009 Jun 18.

Learning real-world stimuli in a neural network with spike-driven synaptic dynamics.

Neural Comput. 2007 Nov;19(11):2881-912. doi: 10.1162/neco.2007.19.11.2881.

Emergence of network structure due to spike-timing-dependent plasticity in recurrent neuronal networks. I. Input selectivity--strengthening correlated input pathways.

Biol Cybern. 2009 Aug;101(2):81-102. doi: 10.1007/s00422-009-0319-4. Epub 2009 Jun 18.

引用本文的文献

Dual neuromodulatory dynamics underlie birdsong learning.

Nature. 2025 May;641(8063):690-698. doi: 10.1038/s41586-025-08694-9. Epub 2025 Mar 12.

Sleep prevents catastrophic forgetting in spiking neural networks by forming a joint synaptic weight representation.

PLoS Comput Biol. 2022 Nov 18;18(11):e1010628. doi: 10.1371/journal.pcbi.1010628. eCollection 2022 Nov.

Training spiking neuronal networks to perform motor control using reinforcement and evolutionary learning.

Front Comput Neurosci. 2022 Sep 30;16:1017284. doi: 10.3389/fncom.2022.1017284. eCollection 2022.

Adaptive control of synaptic plasticity integrates micro- and macroscopic network function.

Neuropsychopharmacology. 2023 Jan;48(1):121-144. doi: 10.1038/s41386-022-01374-6. Epub 2022 Aug 29.

Training a spiking neuronal network model of visual-motor cortex to play a virtual racket-ball game using reinforcement learning.

PLoS One. 2022 May 11;17(5):e0265808. doi: 10.1371/journal.pone.0265808. eCollection 2022.

Mirror neurons are modulated by grip force and reward expectation in the sensorimotor cortices (S1, M1, PMd, PMv).

Sci Rep. 2021 Aug 5;11(1):15959. doi: 10.1038/s41598-021-95536-z.

Artificial Development by Reinforcement Learning Can Benefit From Multiple Motivations.

Front Robot AI. 2019 Feb 14;6:6. doi: 10.3389/frobt.2019.00006. eCollection 2019.

Single-cell transcriptomic evidence for dense intracortical neuropeptide networks.

Elife. 2019 Nov 11;8:e47889. doi: 10.7554/eLife.47889.

Resource Selection in Cognitive Networks With Spiking Neural Networks.

IEEE Trans Cogn Commun Netw. 2018 Aug 14;4(4). doi: 10.1109/TCCN.2018.2865387.

Reinforcement Learning With Low-Complexity Liquid State Machines.

Front Neurosci. 2019 Aug 27;13:883. doi: 10.3389/fnins.2019.00883. eCollection 2019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于调制的尖峰时间依赖突触可塑性的强化学习。

Reinforcement learning with modulated spike timing dependent synaptic plasticity.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献