在多智能体环境中采用不同强化学习（RL）方案的脉冲神经网络。

Christodoulou Chris, Cleanthous Aristodemos

Department of Computer Science, University of Cyprus 75 Kallipoleos Avenue, P.O. Box 20537, 1678 Nicosia, Cyprus.

Chin J Physiol. 2010 Dec 31;53(6):447-53.

This paper investigates the effectiveness of spiking agents when trained with reinforcement learning (RL) in a challenging multiagent task. In particular, it explores learning through reward-modulated spike-timing dependent plasticity (STDP) and compares it to reinforcement of stochastic synaptic transmission in the general-sum game of the Iterated Prisoner's Dilemma (IPD). More specifically, a computational model is developed where we implement two spiking neural networks as two "selfish" agents learning simultaneously but independently, competing in the IPD game. The purpose of our system (or collective) is to maximise its accumulated reward in the presence of reward-driven competing agents within the collective. This can only be achieved when the agents engage in a behaviour of mutual cooperation during the IPD. Previously, we successfully applied reinforcement of stochastic synaptic transmission to the IPD game. The current study utilises reward-modulated STDP with eligibility trace and results show that the system managed to exhibit the desired behaviour by establishing mutual cooperation between the agents. It is noted that the cooperative outcome was attained after a relatively short learning period which enhanced the accumulation of reward by the system. As in our previous implementation, the successful application of the learning algorithm to the IPD becomes possible only after we extended it with additional global reinforcement signals in order to enhance competition at the neuronal level. Moreover it is also shown that learning is enhanced (as indicated by an increased IPD cooperative outcome) through: (i) strong memory for each agent (regulated by a high eligibility trace time constant) and (ii) firing irregularity produced by equipping the agents' LIF neurons with a partial somatic reset mechanism.

本文研究了在具有挑战性的多智能体任务中，使用强化学习（RL）训练时脉冲发放因子的有效性。具体而言，它探索了通过奖励调制的脉冲时间依赖可塑性（STDP）进行学习，并将其与迭代囚徒困境（IPD）一般和博弈中随机突触传递的强化进行比较。更具体地说，开发了一个计算模型，其中我们将两个脉冲神经网络实现为两个同时但独立学习的“自私”智能体，在IPD博弈中竞争。我们系统（或群体）的目的是在群体中存在奖励驱动的竞争智能体的情况下，最大化其累积奖励。这只有在智能体在IPD过程中进行相互合作的行为时才能实现。此前，我们已成功将随机突触传递的强化应用于IPD博弈。当前研究利用具有资格迹线的奖励调制STDP，结果表明该系统通过在智能体之间建立相互合作，成功展现出了期望的行为。值得注意的是，在相对较短的学习期后就实现了合作结果，这增强了系统的奖励积累。与我们之前的实现一样，只有在我们用额外的全局强化信号对学习算法进行扩展以增强神经元层面的竞争之后，才能成功将该学习算法应用于IPD。此外，研究还表明，通过以下方式学习得到了增强（如IPD合作结果增加所示）：（i）每个智能体具有强大的记忆（由高资格迹线时间常数调节），以及（ii）通过为智能体的漏极积分发放（LIF）神经元配备部分体细胞重置机制产生的发放不规则性。

相似文献

Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting.

Chin J Physiol. 2010 Dec 31;53(6):447-53.

Multiagent reinforcement learning: spiking and nonspiking agents in the iterated Prisoner's Dilemma.

IEEE Trans Neural Netw. 2011 Apr;22(4):639-53. doi: 10.1109/TNN.2011.2111384. Epub 2011 Mar 17.

Self-control with spiking and non-spiking neural networks playing games.

J Physiol Paris. 2010 May-Sep;104(3-4):108-17. doi: 10.1016/j.jphysparis.2009.11.013. Epub 2009 Nov 26.

Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.

Neural Comput. 2007 Jun;19(6):1468-502. doi: 10.1162/neco.2007.19.6.1468.

Multiagent reinforcement learning in the Iterated Prisoner's Dilemma.

Biosystems. 1996;37(1-2):147-66. doi: 10.1016/0303-2647(95)01551-5.

Reinforcement Learning in Spiking Neural Networks with Stochastic and Deterministic Synapses.

Neural Comput. 2019 Dec;31(12):2368-2389. doi: 10.1162/neco_a_01238. Epub 2019 Oct 15.

Learning optimisation by high firing irregularity.

Brain Res. 2012 Jan 24;1434:115-22. doi: 10.1016/j.brainres.2011.07.025. Epub 2011 Jul 22.

An implementation of reinforcement learning based on spike timing dependent plasticity.

Biol Cybern. 2008 Dec;99(6):517-23. doi: 10.1007/s00422-008-0265-6. Epub 2008 Oct 22.

Learning in neural networks by reinforcement of irregular spiking.

Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Apr;69(4 Pt 1):041909. doi: 10.1103/PhysRevE.69.041909. Epub 2004 Apr 30.

A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.

PLoS Comput Biol. 2008 Oct;4(10):e1000180. doi: 10.1371/journal.pcbi.1000180. Epub 2008 Oct 10.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting.

Chin J Physiol. 2010 Dec 31;53(6):447-53.

Multiagent reinforcement learning: spiking and nonspiking agents in the iterated Prisoner's Dilemma.

IEEE Trans Neural Netw. 2011 Apr;22(4):639-53. doi: 10.1109/TNN.2011.2111384. Epub 2011 Mar 17.

Self-control with spiking and non-spiking neural networks playing games.

J Physiol Paris. 2010 May-Sep;104(3-4):108-17. doi: 10.1016/j.jphysparis.2009.11.013. Epub 2009 Nov 26.

Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.

Neural Comput. 2007 Jun;19(6):1468-502. doi: 10.1162/neco.2007.19.6.1468.

Multiagent reinforcement learning in the Iterated Prisoner's Dilemma.

Biosystems. 1996;37(1-2):147-66. doi: 10.1016/0303-2647(95)01551-5.

Reinforcement Learning in Spiking Neural Networks with Stochastic and Deterministic Synapses.

Neural Comput. 2019 Dec;31(12):2368-2389. doi: 10.1162/neco_a_01238. Epub 2019 Oct 15.

Learning optimisation by high firing irregularity.

Brain Res. 2012 Jan 24;1434:115-22. doi: 10.1016/j.brainres.2011.07.025. Epub 2011 Jul 22.

An implementation of reinforcement learning based on spike timing dependent plasticity.

Biol Cybern. 2008 Dec;99(6):517-23. doi: 10.1007/s00422-008-0265-6. Epub 2008 Oct 22.

Learning in neural networks by reinforcement of irregular spiking.

Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Apr;69(4 Pt 1):041909. doi: 10.1103/PhysRevE.69.041909. Epub 2004 Apr 30.

A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.

PLoS Comput Biol. 2008 Oct;4(10):e1000180. doi: 10.1371/journal.pcbi.1000180. Epub 2008 Oct 10.

Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献