Liang Xingyue, Wu Qiaoyun, Liu Wenzhang, Zhou Yun, Tan Chunyu, Yin Hongfu, Sun Changyin
School of Artificial Intelligence, Anhui University, Hefei, 230601, Anhui, China; Engineering Research Center of Autonomous Unmanned System Technology, Ministry of Education, Hefei, 230601, Anhui, China; Anhui Provincial Engineering Research Center for Unmanned Systems and Intelligent Technology, Hefei, 230601, Anhui, China.
School of Artificial Intelligence, Anhui University, Hefei, 230601, Anhui, China; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230601, Anhui, China.
Neural Netw. 2025 Apr;184:107054. doi: 10.1016/j.neunet.2024.107054. Epub 2024 Dec 19.
Deep reinforcement learning (DRL) exploits the powerful representational capabilities of deep neural networks (DNNs) and has achieved significant success. However, compared to DNNs, spiking neural networks (SNNs), which operate on binary signals, more closely resemble the biological characteristics of efficient learning observed in the brain. In SNNs, spiking neurons exhibit complex dynamic characteristics and learn based on principles of biological plasticity. Inspired by the brain's efficient computational mechanisms, information encoding plays a critical role in these networks. We propose an intrinsic plasticity coding improved spiking actor network (IP-SAN) for RL to achieve effective decision-making. The IP-SAN integrates adaptive population coding at the network level with dynamic spiking neuron coding at the neuron level, improving spatiotemporal state representation and promoting more accurate biological simulation. Experimental results show that our IP-SAN outperforms several state-of-the-art methods in five continuous control tasks.
深度强化学习(DRL)利用深度神经网络(DNN)强大的表征能力并取得了显著成功。然而,与DNN相比,基于二进制信号运行的脉冲神经网络(SNN)更类似于在大脑中观察到的高效学习的生物学特征。在SNN中,脉冲神经元表现出复杂的动态特性,并基于生物可塑性原理进行学习。受大脑高效计算机制的启发,信息编码在这些网络中起着关键作用。我们提出了一种用于强化学习的内在可塑性编码改进脉冲 actor 网络(IP-SAN),以实现有效的决策。IP-SAN 在网络层面集成了自适应群体编码,在神经元层面集成了动态脉冲神经元编码,改善了时空状态表征并促进了更精确的生物模拟。实验结果表明,我们的IP-SAN在五项连续控制任务中优于几种先进方法。