Suppr超能文献

神经拟态强化学习探索:基于生物启发神经网络的探索-利用平衡计算方法。

Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration-exploitation balance with bio-inspired neural networks.

机构信息

The Center for Advanced Computer Studies, University of Louisiana at Lafayette, 301 East Lewis Street, P.O. Box 43694, Lafayette, LA 70504-3694, United States of America.

出版信息

Neural Netw. 2022 Jul;151:16-33. doi: 10.1016/j.neunet.2022.03.021. Epub 2022 Mar 23.

Abstract

Recent theoretical and experimental works have connected Hebbian plasticity with the reinforcement learning (RL) paradigm, producing a class of trial-and-error learning in artificial neural networks known as neo-Hebbian plasticity. Inspired by the role of the neuromodulator dopamine in synaptic modification, neo-Hebbian RL methods extend unsupervised Hebbian learning rules with value-based modulation to selectively reinforce associations. This reinforcement allows for learning exploitative behaviors and produces RL models with strong biological plausibility. The review begins with coverage of fundamental concepts in rate- and spike-coded models. We introduce Hebbian correlation detection as a basis for modification of synaptic weighting and progress to neo-Hebbian RL models guided solely by extrinsic rewards. We then analyze state-of-the-art neo-Hebbian approaches to the exploration-exploitation balance under the RL paradigm, emphasizing works that employ additional mechanics to modulate that dynamic. Our review of neo-Hebbian RL methods in this context indicates substantial potential for novel improvements in exploratory learning, primarily through stronger incorporation of intrinsic motivators. We provide a number of research suggestions for this pursuit by drawing from modern theories and results in neuroscience and psychology. The exploration-exploitation balance is a central issue in RL research, and this review is the first to focus on it under the neo-Hebbian RL framework.

摘要

最近的理论和实验工作将赫布可塑性与强化学习(RL)范式联系起来,在人工神经网络中产生了一类称为新赫布可塑性的试错学习。受神经调质多巴胺在突触修饰中的作用的启发,新赫布 RL 方法用基于价值的调制扩展了无监督赫布学习规则,以选择性地增强关联。这种强化允许学习剥削性行为,并产生具有很强生物学合理性的 RL 模型。综述首先介绍了速率和尖峰编码模型中的基本概念。我们介绍了赫布相关性检测作为突触权重修改的基础,并进一步发展为仅受外在奖励指导的新赫布 RL 模型。然后,我们根据 RL 范式分析了最先进的新赫布方法在探索-利用平衡方面的情况,强调了那些利用额外机制来调节这种动态的工作。我们通过借鉴神经科学和心理学中的现代理论和结果,对新赫布 RL 方法进行了评估,表明在探索性学习方面有很大的改进潜力,主要是通过更强烈地结合内在激励因素。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验