• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有自适应阈值的赫布突触强化学习。

Reinforcement learning by Hebbian synapses with adaptive thresholds.

作者信息

Pennartz C M

机构信息

California Institute of Technology, Pasadena, USA.

出版信息

Neuroscience. 1997 Nov;81(2):303-19. doi: 10.1016/s0306-4522(97)00118-8.

DOI:10.1016/s0306-4522(97)00118-8
PMID:9300423
Abstract

A central problem in learning theory is how the vertebrate brain processes reinforcing stimuli in order to master complex sensorimotor tasks. This problem belongs to the domain of supervised learning, in which errors in the response of a neural network serve as the basis for modification of synaptic connectivity in the network and thereby train it on a computational task. The model presented here shows how a reinforcing feedback can modify synapses in a neuronal network according to the principles of Hebbian learning. The reinforcing feedback steers synapses towards long-term potentiation or depression by critically influencing the rise in postsynaptic calcium, in accordance with findings on synaptic plasticity in mammalian brain. An important feature of the model is the dependence of modification thresholds on the previous history of reinforcing feedback processed by the network. The learning algorithm trained networks successfully on a task in which a population vector in the motor output was required to match a sensory stimulus vector presented shortly before. In another task, networks were trained to compute coordinate transformations by combining different visual inputs. The model continued to behave well when simplified units were replaced by single-compartment neurons equipped with several conductances and operating in continuous time. This novel form of reinforcement learning incorporates essential properties of Hebbian synaptic plasticity and thereby shows that supervised learning can be accomplished by a learning rule similar to those used in physiologically plausible models of unsupervised learning. The model can be crudely correlated to the anatomy and electrophysiology of the amygdala, prefrontal and cingulate cortex and has predictive implications for further experiments on synaptic plasticity and learning processes mediated by these areas.

摘要

学习理论中的一个核心问题是脊椎动物的大脑如何处理强化刺激,以便掌握复杂的感觉运动任务。这个问题属于监督学习领域,在该领域中,神经网络响应中的错误作为网络中突触连接性修改的基础,从而使其在计算任务上得到训练。这里提出的模型展示了强化反馈如何根据赫布学习原理修改神经网络中的突触。根据哺乳动物大脑中突触可塑性的研究结果,强化反馈通过关键地影响突触后钙的升高,将突触导向长时程增强或抑制。该模型的一个重要特征是修改阈值取决于网络处理的强化反馈的先前历史。学习算法在一项任务上成功地训练了网络,在该任务中,要求运动输出中的群体向量匹配不久前呈现的感觉刺激向量。在另一项任务中,网络被训练通过组合不同的视觉输入来计算坐标变换。当用配备多种电导并在连续时间内运行的单室神经元取代简化单元时,该模型仍表现良好。这种新颖的强化学习形式纳入了赫布突触可塑性的基本特性,从而表明监督学习可以通过类似于无监督学习的生理学上合理的模型中使用的学习规则来完成。该模型可以大致与杏仁核、前额叶和扣带回皮质的解剖结构和电生理相关联,并对由这些区域介导的突触可塑性和学习过程的进一步实验具有预测意义。

相似文献

1
Reinforcement learning by Hebbian synapses with adaptive thresholds.具有自适应阈值的赫布突触强化学习。
Neuroscience. 1997 Nov;81(2):303-19. doi: 10.1016/s0306-4522(97)00118-8.
2
Partial Breakdown of Input Specificity of STDP at Individual Synapses Promotes New Learning.单个突触处STDP输入特异性的部分瓦解促进新的学习。
J Neurosci. 2016 Aug 24;36(34):8842-55. doi: 10.1523/JNEUROSCI.0552-16.2016.
3
Reinforcement learning with modulated spike timing dependent synaptic plasticity.基于调制的尖峰时间依赖突触可塑性的强化学习。
J Neurophysiol. 2007 Dec;98(6):3648-65. doi: 10.1152/jn.00364.2007. Epub 2007 Oct 10.
4
Synaptic dynamics: linear model and adaptation algorithm.突触动力学:线性模型与自适应算法。
Neural Netw. 2014 Aug;56:49-68. doi: 10.1016/j.neunet.2014.04.001. Epub 2014 Apr 28.
5
Dimensional reduction for reward-based learning.基于奖励学习的降维
Network. 2006 Sep;17(3):235-52. doi: 10.1080/09548980600773215.
6
Supervised learning through neuronal response modulation.通过神经元反应调制进行监督学习。
Neural Comput. 2005 Mar;17(3):609-31. doi: 10.1162/0899766053019980.
7
Attention-gated reinforcement learning of internal representations for classification.用于分类的内部表征的注意力门控强化学习。
Neural Comput. 2005 Oct;17(10):2176-214. doi: 10.1162/0899766054615699.
8
Competitive Hebbian learning through spike-timing-dependent synaptic plasticity.通过依赖于脉冲时间的突触可塑性实现竞争性赫布学习。
Nat Neurosci. 2000 Sep;3(9):919-26. doi: 10.1038/78829.
9
The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms.预测性大脑:突触学习机制中的时间一致性和时间顺序
Learn Mem. 1994 May-Jun;1(1):1-33.
10
Fast Learning with Weak Synaptic Plasticity.基于弱突触可塑性的快速学习
J Neurosci. 2015 Sep 30;35(39):13351-62. doi: 10.1523/JNEUROSCI.0607-15.2015.

引用本文的文献

1
Neural correlates of object identity and reward outcome in the sensory cortical-hippocampal hierarchy: coding of motivational information in perirhinal cortex.感觉皮质-海马层次中物体身份和奖励结果的神经关联:边缘皮层中动机信息的编码。
Cereb Cortex. 2024 Jan 31;34(2). doi: 10.1093/cercor/bhae002.
2
The Dopamine System and Automatization of Movement Sequences: A Review With Relevance for Speech and Stuttering.多巴胺系统与运动序列的自动化:与言语和口吃相关的综述
Front Hum Neurosci. 2021 Dec 2;15:661880. doi: 10.3389/fnhum.2021.661880. eCollection 2021.
3
Reconciling the object and spatial processing views of the perirhinal cortex through task-relevant unitization.
通过与任务相关的单元化来协调边缘系统皮层的目标和空间处理观点。
Hippocampus. 2021 Jul;31(7):737-755. doi: 10.1002/hipo.23304. Epub 2021 Feb 1.
4
Learning, memory and consolidation mechanisms for behavioral control in hierarchically organized cortico-basal ganglia systems.分层组织的皮质-基底神经节系统中行为控制的学习、记忆和巩固机制。
Hippocampus. 2020 Jan;30(1):73-98. doi: 10.1002/hipo.23167. Epub 2019 Oct 16.
5
Conditioning sharpens the spatial representation of rewarded stimuli in mouse primary visual cortex.条件作用增强了小鼠初级视觉皮层中奖励刺激的空间表示。
Elife. 2018 Sep 17;7:e37683. doi: 10.7554/eLife.37683.
6
Reversal Learning in Humans and Gerbils: Dynamic Control Network Facilitates Learning.人类和沙鼠的反转学习:动态控制网络促进学习。
Front Neurosci. 2016 Nov 17;10:535. doi: 10.3389/fnins.2016.00535. eCollection 2016.
7
A confidence metric for using neurobiological feedback in actor-critic reinforcement learning based brain-machine interfaces.基于演员-批评家强化学习的脑机接口中使用神经生物学反馈的置信度度量。
Front Neurosci. 2014 May 26;8:111. doi: 10.3389/fnins.2014.00111. eCollection 2014.
8
In vivo two-photon Ca2+ imaging reveals selective reward effects on stimulus-specific assemblies in mouse visual cortex.在体双光子钙成像揭示了在小鼠视觉皮层中对刺激特异性组合的选择性奖励效应。
J Neurosci. 2013 Jul 10;33(28):11540-55. doi: 10.1523/JNEUROSCI.1341-12.2013.
9
A reinforcement learning framework for spiking networks with dynamic synapses.一种具有动态突触的尖峰网络的强化学习框架。
Comput Intell Neurosci. 2011;2011:869348. doi: 10.1155/2011/869348. Epub 2011 Oct 23.
10
Theta-band phase locking of orbitofrontal neurons during reward expectancy.眶额皮层神经元在奖励预期期间的θ 波段相位锁定。
J Neurosci. 2010 May 19;30(20):7078-87. doi: 10.1523/JNEUROSCI.3860-09.2010.