• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈向稳健且可扩展的深度脉冲强化学习。

Toward robust and scalable deep spiking reinforcement learning.

作者信息

Akl Mahmoud, Ergene Deniz, Walter Florian, Knoll Alois

机构信息

Chair of Robotics, Artificial Intelligence and Embedded Systems, TUM School of Computation, Information and Technology, Technische Universität München, Munich, Germany.

出版信息

Front Neurorobot. 2023 Jan 20;16:1075647. doi: 10.3389/fnbot.2022.1075647. eCollection 2022.

DOI:10.3389/fnbot.2022.1075647
PMID:36742191
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9894879/
Abstract

Deep reinforcement learning (DRL) combines reinforcement learning algorithms with deep neural networks (DNNs). Spiking neural networks (SNNs) have been shown to be a biologically plausible and energy efficient alternative to DNNs. Since the introduction of surrogate gradient approaches that allowed to overcome the discontinuity in the spike function, SNNs can now be trained with the backpropagation through time (BPTT) algorithm. While largely explored on supervised learning problems, little work has been done on investigating the use of SNNs as function approximators in DRL. Here we show how SNNs can be applied to different DRL algorithms like Deep Q-Network (DQN) and Twin-Delayed Deep Deteministic Policy Gradient (TD3) for discrete and continuous action space environments, respectively. We found that SNNs are sensitive to the additional hyperparameters introduced by spiking neuron models like current and voltage decay factors, firing thresholds, and that extensive hyperparameter tuning is inevitable. However, we show that increasing the simulation time of SNNs, as well as applying a two-neuron encoding to the input observations helps reduce the sensitivity to the membrane parameters. Furthermore, we show that randomizing the membrane parameters, instead of selecting uniform values for all neurons, has stabilizing effects on the training. We conclude that SNNs can be utilized for learning complex continuous control problems with state-of-the-art DRL algorithms. While the training complexity increases, the resulting SNNs can be directly executed on neuromorphic processors and potentially benefit from their high energy efficiency.

摘要

深度强化学习(DRL)将强化学习算法与深度神经网络(DNN)相结合。脉冲神经网络(SNN)已被证明是一种在生物学上合理且节能的DNN替代方案。自从引入允许克服脉冲函数不连续性的替代梯度方法以来,现在可以使用通过时间反向传播(BPTT)算法来训练SNN。虽然在监督学习问题上已经进行了大量探索,但在研究将SNN用作DRL中的函数逼近器方面所做的工作很少。在这里,我们展示了SNN如何分别应用于不同的DRL算法,如深度Q网络(DQN)和双延迟深度确定性策略梯度(TD3),用于离散和连续动作空间环境。我们发现SNN对脉冲神经元模型引入的额外超参数(如电流和电压衰减因子、激发阈值)很敏感,并且不可避免地需要进行广泛的超参数调整。然而,我们表明增加SNN的模拟时间,以及对输入观测值应用双神经元编码有助于降低对膜参数的敏感性。此外,我们表明随机化膜参数,而不是为所有神经元选择统一的值,对训练有稳定作用。我们得出结论,SNN可用于通过先进的DRL算法学习复杂的连续控制问题。虽然训练复杂度增加,但所得的SNN可以直接在神经形态处理器上执行,并可能受益于其高能效。

相似文献

1
Toward robust and scalable deep spiking reinforcement learning.迈向稳健且可扩展的深度脉冲强化学习。
Front Neurorobot. 2023 Jan 20;16:1075647. doi: 10.3389/fnbot.2022.1075647. eCollection 2022.
2
Fully Spiking Actor Network With Intralayer Connections for Reinforcement Learning.用于强化学习的具有层内连接的全尖峰神经元智能体网络
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2881-2893. doi: 10.1109/TNNLS.2024.3352653. Epub 2025 Feb 6.
3
SSTDP: Supervised Spike Timing Dependent Plasticity for Efficient Spiking Neural Network Training.SSTDP:用于高效脉冲神经网络训练的监督式脉冲时间依赖可塑性
Front Neurosci. 2021 Nov 4;15:756876. doi: 10.3389/fnins.2021.756876. eCollection 2021.
4
Combining STDP and binary networks for reinforcement learning from images and sparse rewards.结合 STDP 和二进制网络,从图像和稀疏奖励中进行强化学习。
Neural Netw. 2021 Dec;144:496-506. doi: 10.1016/j.neunet.2021.09.010. Epub 2021 Sep 17.
5
Deep learning in spiking neural networks.深度学习在尖峰神经网络中的应用。
Neural Netw. 2019 Mar;111:47-63. doi: 10.1016/j.neunet.2018.12.002. Epub 2018 Dec 18.
6
Human-Level Control Through Directly Trained Deep Spiking Q-Networks.通过直接训练的深度脉冲Q网络实现人类水平的控制。
IEEE Trans Cybern. 2023 Nov;53(11):7187-7198. doi: 10.1109/TCYB.2022.3198259. Epub 2023 Oct 17.
7
Backpropagation-Based Learning Techniques for Deep Spiking Neural Networks: A Survey.基于反向传播的深度学习尖峰神经网络学习技术综述。
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):11906-11921. doi: 10.1109/TNNLS.2023.3263008. Epub 2024 Sep 3.
8
HybridSNN: Combining Bio-Machine Strengths by Boosting Adaptive Spiking Neural Networks.HybridSNN:通过提升自适应尖峰神经网络来结合生物机器的优势。
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5841-5855. doi: 10.1109/TNNLS.2021.3131356. Epub 2023 Sep 1.
9
Enabling Spike-Based Backpropagation for Training Deep Neural Network Architectures.实现基于尖峰的反向传播以训练深度神经网络架构。
Front Neurosci. 2020 Feb 28;14:119. doi: 10.3389/fnins.2020.00119. eCollection 2020.
10
Tuning Convolutional Spiking Neural Network With Biologically Plausible Reward Propagation.基于生物合理奖励传播的卷积脉冲神经网络调优
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7621-7631. doi: 10.1109/TNNLS.2021.3085966. Epub 2022 Nov 30.

引用本文的文献

1
BN-SNN: Spiking neural networks with bistable neurons for object detection.BN-SNN:用于目标检测的具有双稳态神经元的脉冲神经网络
PLoS One. 2025 Jul 10;20(7):e0327513. doi: 10.1371/journal.pone.0327513. eCollection 2025.
2
Towards biologically plausible model-based reinforcement learning in recurrent spiking networks by dreaming new experiences.通过“梦想”新的体验,在递归尖峰网络中实现基于生物学合理性的基于模型的强化学习。
Sci Rep. 2024 Jun 25;14(1):14656. doi: 10.1038/s41598-024-65631-y.

本文引用的文献

1
Neural heterogeneity promotes robust learning.神经异质性促进了稳健的学习。
Nat Commun. 2021 Oct 4;12(1):5791. doi: 10.1038/s41467-021-26022-3.
2
Neural Coding in Spiking Neural Networks: A Comparative Study for Robust Neuromorphic Systems.脉冲神经网络中的神经编码:对鲁棒神经形态系统的比较研究
Front Neurosci. 2021 Mar 4;15:638474. doi: 10.3389/fnins.2021.638474. eCollection 2021.
3
The Remarkable Robustness of Surrogate Gradient Learning for Instilling Complex Function in Spiking Neural Networks.尖峰神经网络中复杂功能的代理梯度学习的显著稳健性。
Neural Comput. 2021 Mar 26;33(4):899-925. doi: 10.1162/neco_a_01367.
4
Event-Based Vision: A Survey.基于事件的视觉:综述。
IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):154-180. doi: 10.1109/TPAMI.2020.3008413. Epub 2021 Dec 7.
5
Impact of the Sub-Resting Membrane Potential on Accurate Inference in Spiking Neural Networks.亚静息膜电位对尖峰神经网络准确推断的影响。
Sci Rep. 2020 Feb 26;10(1):3515. doi: 10.1038/s41598-020-60572-8.
6
Towards spike-based machine intelligence with neuromorphic computing.迈向基于尖峰的机器智能的神经形态计算。
Nature. 2019 Nov;575(7784):607-617. doi: 10.1038/s41586-019-1677-2. Epub 2019 Nov 27.
7
Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to Atari Breakout game.强化学习策略在转换到应用于雅达利打破块游戏的尖峰神经元网络平台后的鲁棒性提高。
Neural Netw. 2019 Dec;120:108-115. doi: 10.1016/j.neunet.2019.08.009. Epub 2019 Aug 25.
8
Going Deeper in Spiking Neural Networks: VGG and Residual Architectures.深入探索脉冲神经网络:VGG和残差架构。
Front Neurosci. 2019 Mar 7;13:95. doi: 10.3389/fnins.2019.00095. eCollection 2019.
9
Brain songs framework used for discovering the relevant timescale of the human brain.用于发现人脑相关时间尺度的大脑歌曲框架。
Nat Commun. 2019 Feb 4;10(1):583. doi: 10.1038/s41467-018-08186-7.
10
Theories of Error Back-Propagation in the Brain.大脑中的误差反向传播理论。
Trends Cogn Sci. 2019 Mar;23(3):235-250. doi: 10.1016/j.tics.2018.12.005. Epub 2019 Jan 28.