Kubo Yoshimasa, Chalmers Eric, Luczak Artur
Canadian Centre for Behavioural Neuroscience, University of Lethbridge, Lethbridge, AB, Canada.
Department of Mathematics and Computing, Mount Royal University, Calgary, AB, Canada.
Front Comput Neurosci. 2022 Aug 23;16:980613. doi: 10.3389/fncom.2022.980613. eCollection 2022.
Backpropagation (BP) has been used to train neural networks for many years, allowing them to solve a wide variety of tasks like image classification, speech recognition, and reinforcement learning tasks. But the biological plausibility of BP as a mechanism of neural learning has been questioned. Equilibrium Propagation (EP) has been proposed as a more biologically plausible alternative and achieves comparable accuracy on the CIFAR-10 image classification task. This study proposes the first EP-based reinforcement learning architecture: an Actor-Critic architecture with the actor network trained by EP. We show that this model can solve the basic control tasks often used as benchmarks for BP-based models. Interestingly, our trained model demonstrates more consistent high-reward behavior than a comparable model trained exclusively by BP.
反向传播(BP)已被用于训练神经网络多年,使它们能够解决各种各样的任务,如图像分类、语音识别和强化学习任务。但BP作为一种神经学习机制的生物学合理性受到了质疑。平衡传播(EP)已被提出作为一种更具生物学合理性的替代方法,并在CIFAR-10图像分类任务上取得了相当的准确率。本研究提出了首个基于EP的强化学习架构:一种演员-评论家架构,其中演员网络由EP训练。我们表明,该模型可以解决通常用作基于BP模型基准的基本控制任务。有趣的是,我们训练的模型比仅由BP训练的可比模型表现出更一致的高奖励行为。