将反向传播与平衡传播相结合以改进演员-评论家强化学习框架。

Combining backpropagation with Equilibrium Propagation to improve an Actor-Critic reinforcement learning framework.

作者信息

Kubo Yoshimasa, Chalmers Eric, Luczak Artur

机构信息

Canadian Centre for Behavioural Neuroscience, University of Lethbridge, Lethbridge, AB, Canada.

Department of Mathematics and Computing, Mount Royal University, Calgary, AB, Canada.

出版信息

Front Comput Neurosci. 2022 Aug 23;16:980613. doi: 10.3389/fncom.2022.980613. eCollection 2022.

DOI:10.3389/fncom.2022.980613

PMID:36082305

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9446087/

Abstract

Backpropagation (BP) has been used to train neural networks for many years, allowing them to solve a wide variety of tasks like image classification, speech recognition, and reinforcement learning tasks. But the biological plausibility of BP as a mechanism of neural learning has been questioned. Equilibrium Propagation (EP) has been proposed as a more biologically plausible alternative and achieves comparable accuracy on the CIFAR-10 image classification task. This study proposes the first EP-based reinforcement learning architecture: an Actor-Critic architecture with the actor network trained by EP. We show that this model can solve the basic control tasks often used as benchmarks for BP-based models. Interestingly, our trained model demonstrates more consistent high-reward behavior than a comparable model trained exclusively by BP.

摘要

反向传播（BP）已被用于训练神经网络多年，使它们能够解决各种各样的任务，如图像分类、语音识别和强化学习任务。但BP作为一种神经学习机制的生物学合理性受到了质疑。平衡传播（EP）已被提出作为一种更具生物学合理性的替代方法，并在CIFAR-10图像分类任务上取得了相当的准确率。本研究提出了首个基于EP的强化学习架构：一种演员-评论家架构，其中演员网络由EP训练。我们表明，该模型可以解决通常用作基于BP模型基准的基本控制任务。有趣的是，我们训练的模型比仅由BP训练的可比模型表现出更一致的高奖励行为。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58c/9446087/0731bfdd5b96/fncom-16-980613-g001.jpg

相似文献

Combining backpropagation with Equilibrium Propagation to improve an Actor-Critic reinforcement learning framework.将反向传播与平衡传播相结合以改进演员-评论家强化学习框架。

Front Comput Neurosci. 2022 Aug 23;16:980613. doi: 10.3389/fncom.2022.980613. eCollection 2022.

Meta attention for Off-Policy Actor-Critic.用于离策略演员-评论家的元注意力机制

Neural Netw. 2023 Jun;163:86-96. doi: 10.1016/j.neunet.2023.03.024. Epub 2023 Mar 28.

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.使用具有尖峰神经元的连续时间动作 - 评论框架进行强化学习。

PLoS Comput Biol. 2013 Apr;9(4):e1003024. doi: 10.1371/journal.pcbi.1003024. Epub 2013 Apr 11.

Behavior fusion for deep reinforcement learning.深度强化学习中的行为融合

ISA Trans. 2020 Mar;98:434-444. doi: 10.1016/j.isatra.2019.08.054. Epub 2019 Sep 17.

The role of multisensor data fusion in neuromuscular control of a sagittal arm with a pair of muscles using actor-critic reinforcement learning method.多传感器数据融合在使用演员-评论家强化学习方法对具有一对肌肉的矢状臂进行神经肌肉控制中的作用。

Technol Health Care. 2004;12(6):425-38.

Biologically-inspired neuronal adaptation improves learning in neural networks.受生物启发的神经元适应性改善神经网络中的学习。

Commun Integr Biol. 2023 Jan 17;16(1):2163131. doi: 10.1080/19420889.2022.2163131. eCollection 2023.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型，用于学习空间延迟反应任务。

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

A nonlinear hidden layer enables actor-critic agents to learn multiple paired association navigation.非线性隐藏层使动作-评论者代理能够学习多个配对关联导航。

Cereb Cortex. 2022 Sep 4;32(18):3917-3936. doi: 10.1093/cercor/bhab456.

Neuromuscular control of the point to point and oscillatory movements of a sagittal arm with the actor-critic reinforcement learning method.基于演员-评论家强化学习方法对矢状臂点对点运动和振荡运动的神经肌肉控制

Comput Methods Biomech Biomed Engin. 2005 Apr;8(2):103-13. doi: 10.1080/10255840500167952.

Tuning Convolutional Spiking Neural Network With Biologically Plausible Reward Propagation.基于生物合理奖励传播的卷积脉冲神经网络调优

IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7621-7631. doi: 10.1109/TNNLS.2021.3085966. Epub 2022 Nov 30.

引用本文的文献

A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems.一种使用自注意力机制的优先经验回放演员-评论家算法，用于离散问题的策略优化。

PeerJ Comput Sci. 2024 Jun 28;10:e2161. doi: 10.7717/peerj-cs.2161. eCollection 2024.

Biologically-inspired neuronal adaptation improves learning in neural networks.受生物启发的神经元适应性改善神经网络中的学习。

Commun Integr Biol. 2023 Jan 17;16(1):2163131. doi: 10.1080/19420889.2022.2163131. eCollection 2023.

本文引用的文献

Biologically-inspired neuronal adaptation improves learning in neural networks.受生物启发的神经元适应性改善神经网络中的学习。

Commun Integr Biol. 2023 Jan 17;16(1):2163131. doi: 10.1080/19420889.2022.2163131. eCollection 2023.

Neurons learn by predicting future activity.神经元通过预测未来的活动来学习。

Nat Mach Intell. 2022 Jan;4(1):62-72. doi: 10.1038/s42256-021-00430-y. Epub 2022 Jan 25.

Predictive Neuronal Adaptation as a Basis for Consciousness.作为意识基础的预测性神经元适应

Front Syst Neurosci. 2022 Jan 11;15:767461. doi: 10.3389/fnsys.2021.767461. eCollection 2021.

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing Its Gradient Estimator Bias.通过大幅降低梯度估计偏差将平衡传播扩展到深度卷积神经网络

Front Neurosci. 2021 Feb 18;15:633674. doi: 10.3389/fnins.2021.633674. eCollection 2021.

Data-driven analyses of motor impairments in animal models of neurological disorders.基于数据的神经退行性疾病动物模型运动障碍分析。

PLoS Biol. 2019 Nov 21;17(11):e3000516. doi: 10.1371/journal.pbio.3000516. eCollection 2019 Nov.

Contrastive Learning and Neural Oscillations.对比学习与神经振荡

Neural Comput. 1991 Winter;3(4):526-545. doi: 10.1162/neco.1991.3.4.526.

Equivalence of Equilibrium Propagation and Recurrent Backpropagation.平衡传播与循环反向传播的等效性。

Neural Comput. 2019 Feb;31(2):312-329. doi: 10.1162/neco_a_01160. Epub 2018 Dec 21.

Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation.平衡传播：弥合基于能量模型与反向传播之间的差距

Front Comput Neurosci. 2017 May 4;11:24. doi: 10.3389/fncom.2017.00024. eCollection 2017.

Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。

Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.

Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。

Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将反向传播与平衡传播相结合以改进演员-评论家强化学习框架。

Combining backpropagation with Equilibrium Propagation to improve an Actor-Critic reinforcement learning framework.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献