• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

神经拟态强化学习探索:基于生物启发神经网络的探索-利用平衡计算方法。

Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration-exploitation balance with bio-inspired neural networks.

机构信息

The Center for Advanced Computer Studies, University of Louisiana at Lafayette, 301 East Lewis Street, P.O. Box 43694, Lafayette, LA 70504-3694, United States of America.

出版信息

Neural Netw. 2022 Jul;151:16-33. doi: 10.1016/j.neunet.2022.03.021. Epub 2022 Mar 23.

DOI:10.1016/j.neunet.2022.03.021
PMID:35367735
Abstract

Recent theoretical and experimental works have connected Hebbian plasticity with the reinforcement learning (RL) paradigm, producing a class of trial-and-error learning in artificial neural networks known as neo-Hebbian plasticity. Inspired by the role of the neuromodulator dopamine in synaptic modification, neo-Hebbian RL methods extend unsupervised Hebbian learning rules with value-based modulation to selectively reinforce associations. This reinforcement allows for learning exploitative behaviors and produces RL models with strong biological plausibility. The review begins with coverage of fundamental concepts in rate- and spike-coded models. We introduce Hebbian correlation detection as a basis for modification of synaptic weighting and progress to neo-Hebbian RL models guided solely by extrinsic rewards. We then analyze state-of-the-art neo-Hebbian approaches to the exploration-exploitation balance under the RL paradigm, emphasizing works that employ additional mechanics to modulate that dynamic. Our review of neo-Hebbian RL methods in this context indicates substantial potential for novel improvements in exploratory learning, primarily through stronger incorporation of intrinsic motivators. We provide a number of research suggestions for this pursuit by drawing from modern theories and results in neuroscience and psychology. The exploration-exploitation balance is a central issue in RL research, and this review is the first to focus on it under the neo-Hebbian RL framework.

摘要

最近的理论和实验工作将赫布可塑性与强化学习(RL)范式联系起来,在人工神经网络中产生了一类称为新赫布可塑性的试错学习。受神经调质多巴胺在突触修饰中的作用的启发,新赫布 RL 方法用基于价值的调制扩展了无监督赫布学习规则,以选择性地增强关联。这种强化允许学习剥削性行为,并产生具有很强生物学合理性的 RL 模型。综述首先介绍了速率和尖峰编码模型中的基本概念。我们介绍了赫布相关性检测作为突触权重修改的基础,并进一步发展为仅受外在奖励指导的新赫布 RL 模型。然后,我们根据 RL 范式分析了最先进的新赫布方法在探索-利用平衡方面的情况,强调了那些利用额外机制来调节这种动态的工作。我们通过借鉴神经科学和心理学中的现代理论和结果,对新赫布 RL 方法进行了评估,表明在探索性学习方面有很大的改进潜力,主要是通过更强烈地结合内在激励因素。

相似文献

1
Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration-exploitation balance with bio-inspired neural networks.神经拟态强化学习探索:基于生物启发神经网络的探索-利用平衡计算方法。
Neural Netw. 2022 Jul;151:16-33. doi: 10.1016/j.neunet.2022.03.021. Epub 2022 Mar 23.
2
A reinforcement learning framework for spiking networks with dynamic synapses.一种具有动态突触的尖峰网络的强化学习框架。
Comput Intell Neurosci. 2011;2011:869348. doi: 10.1155/2011/869348. Epub 2011 Oct 23.
3
Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.具有调制赫布型加Q网络架构的深度强化学习
IEEE Trans Neural Netw Learn Syst. 2022 May;33(5):2045-2056. doi: 10.1109/TNNLS.2021.3110281. Epub 2022 May 2.
4
Combining STDP and binary networks for reinforcement learning from images and sparse rewards.结合 STDP 和二进制网络,从图像和稀疏奖励中进行强化学习。
Neural Netw. 2021 Dec;144:496-506. doi: 10.1016/j.neunet.2021.09.010. Epub 2021 Sep 17.
5
Neuro-Inspired Reinforcement Learning to Improve Trajectory Prediction in Reward-Guided Behavior.神经启发式强化学习改进奖励导向行为中的轨迹预测。
Int J Neural Syst. 2022 Sep;32(9):2250038. doi: 10.1142/S0129065722500381. Epub 2022 Aug 19.
6
Classic Hebbian learning endows feed-forward networks with sufficient adaptability in challenging reinforcement learning tasks.经典赫布学习使前馈网络在具有挑战性的强化学习任务中具有足够的适应性。
J Neurophysiol. 2021 Jun 1;125(6):2034-2037. doi: 10.1152/jn.00712.2020. Epub 2021 Apr 28.
7
Asymmetric and adaptive reward coding via normalized reinforcement learning.通过归一化强化学习进行非对称和自适应奖励编码。
PLoS Comput Biol. 2022 Jul 21;18(7):e1010350. doi: 10.1371/journal.pcbi.1010350. eCollection 2022 Jul.
8
Reinforcement Learning in Spiking Neural Networks with Stochastic and Deterministic Synapses.具有随机和确定性突触的尖峰神经网络中的强化学习。
Neural Comput. 2019 Dec;31(12):2368-2389. doi: 10.1162/neco_a_01238. Epub 2019 Oct 15.
9
Nutrient-Sensitive Reinforcement Learning in Monkeys.猴子的营养敏感强化学习。
J Neurosci. 2023 Mar 8;43(10):1714-1730. doi: 10.1523/JNEUROSCI.0752-22.2022. Epub 2023 Jan 20.
10
Memory-Dependent Computation and Learning in Spiking Neural Networks Through Hebbian Plasticity.通过赫布可塑性实现脉冲神经网络中依赖记忆的计算与学习
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2551-2562. doi: 10.1109/TNNLS.2023.3341446. Epub 2025 Feb 6.

引用本文的文献

1
On Predictive Planning and Counterfactual Learning in Active Inference.关于主动推理中的预测规划与反事实学习
Entropy (Basel). 2024 May 31;26(6):484. doi: 10.3390/e26060484.
2
Inhibition of Dopamine Neurons Prevents Incentive Value Encoding of a Reward Cue: With Revelations from Deep Phenotyping.抑制多巴胺神经元可防止奖励线索的激励价值编码:来自深度表型分析的启示。
J Neurosci. 2023 Nov 1;43(44):7376-7392. doi: 10.1523/JNEUROSCI.0848-23.2023. Epub 2023 Sep 14.
3
Inhibition of dopamine neurons prevents incentive value encoding of a reward cue: With revelations from deep phenotyping.
抑制多巴胺神经元可阻止奖励线索的动机价值编码:来自深度表型分析的启示。
bioRxiv. 2023 May 5:2023.05.03.539324. doi: 10.1101/2023.05.03.539324.