利用随机释放可塑性进行奖励优化学习。

Reward-optimizing learning using stochastic release plasticity.

作者信息

Sun Yuhao, Liao Wantong, Li Jinhao, Zhang Xinche, Wang Guan, Ma Zhiyuan, Song Sen

机构信息

Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China.

School of Biomedical Engineering, Tsinghua University, Beijing, China.

出版信息

Front Neural Circuits. 2025 Aug 14;19:1618506. doi: 10.3389/fncir.2025.1618506. eCollection 2025.

DOI:10.3389/fncir.2025.1618506

PMID:40896519

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12390965/

Abstract

Synaptic plasticity underlies adaptive learning in neural systems, offering a biologically plausible framework for reward-driven learning. However, a question remains: how can plasticity rules achieve robustness and effectiveness comparable to error backpropagation? In this study, we introduce Reward-Optimized Stochastic Release Plasticity (RSRP), a learning framework where synaptic release is modeled as a parameterized distribution. Utilizing natural gradient estimation, we derive a synaptic plasticity learning rule that effectively adapts to maximize reward signals. Our approach achieves competitive performance and demonstrates stability in reinforcement learning, comparable to Proximal Policy Optimization (PPO), while attaining accuracy comparable with error backpropagation in digit classification. Additionally, we identify reward regularization as a key stabilizing mechanism and validate our method in biologically plausible networks. Our findings suggest that RSRP offers a robust and effective plasticity learning rule, especially in a discontinuous reinforcement learning paradigm, with potential implications for both artificial intelligence and experimental neuroscience.

摘要

突触可塑性是神经系统适应性学习的基础，为奖励驱动学习提供了一个生物学上合理的框架。然而，一个问题仍然存在：可塑性规则如何实现与误差反向传播相当的鲁棒性和有效性？在本研究中，我们引入了奖励优化随机释放可塑性（RSRP），这是一种学习框架，其中突触释放被建模为参数化分布。利用自然梯度估计，我们推导出一种突触可塑性学习规则，该规则能有效适应以最大化奖励信号。我们的方法实现了具有竞争力的性能，并在强化学习中表现出稳定性，与近端策略优化（PPO）相当，同时在数字分类中达到了与误差反向传播相当的准确率。此外，我们确定奖励正则化是一种关键的稳定机制，并在生物学上合理的网络中验证了我们的方法。我们的研究结果表明，RSRP提供了一种鲁棒且有效的可塑性学习规则，特别是在不连续强化学习范式中，对人工智能和实验神经科学都有潜在影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fdf/12390965/045de863df4a/fncir-19-1618506-g0001.jpg

相似文献

Reward-optimizing learning using stochastic release plasticity.利用随机释放可塑性进行奖励优化学习。

Front Neural Circuits. 2025 Aug 14;19:1618506. doi: 10.3389/fncir.2025.1618506. eCollection 2025.

Synaptic weight dynamics underlying memory consolidation: Implications for learning rules, circuit organization, and circuit function.记忆巩固的突触权重动态：对学习规则、电路组织和电路功能的启示。

Proc Natl Acad Sci U S A. 2024 Oct 8;121(41):e2406010121. doi: 10.1073/pnas.2406010121. Epub 2024 Oct 4.

Short-term plasticity influences episodic memory recall: an interplay of synaptic traces in a spiking neural network model.短期可塑性影响情景记忆回忆：尖峰神经网络模型中突触痕迹的相互作用。

Sci Rep. 2025 Aug 1;15(1):28164. doi: 10.1038/s41598-025-12611-5.

STSF: Spiking Time Sparse Feedback Learning for Spiking Neural Networks.STSF：用于脉冲神经网络的脉冲时间稀疏反馈学习

IEEE Trans Neural Netw Learn Syst. 2025 Jun;36(6):11479-11492. doi: 10.1109/TNNLS.2025.3527700.

Model-based inference of synaptic plasticity rules.基于模型的突触可塑性规则推理。

Adv Neural Inf Process Syst. 2024;37:48519-48540.

Dynamic Regulation of the Serotonin-Dopamine Interaction Within a Meta-reinforcement Learning Framework Encompassing the Prefrontal Cortex and Basal Ganglia.在包含前额叶皮层和基底神经节的元强化学习框架内血清素-多巴胺相互作用的动态调节

Int J Neural Syst. 2025 Aug;35(8):2550040. doi: 10.1142/S0129065725500406.

Disentangling prediction error and value in a formal test of dopamine's role in reinforcement learning.在一项关于多巴胺在强化学习中作用的正式测试中，区分预测误差和价值。

Curr Biol. 2025 Aug 18;35(16):4019-4027.e7. doi: 10.1016/j.cub.2025.06.076. Epub 2025 Jul 29.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果：一种针对特定个体见解的新型验证方法。

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Tonic dopamine and biases in value learning linked through a biologically inspired reinforcement learning model.通过生物启发式强化学习模型，紧张性多巴胺与价值学习中的偏差相联系。

Nat Commun. 2025 Aug 13;16(1):7529. doi: 10.1038/s41467-025-62280-1.

本文引用的文献

Distinct synaptic plasticity rules operate across dendritic compartments in vivo during learning.在学习过程中，不同的突触可塑性规则在体内跨树突区室发挥作用。

Science. 2025 Apr 18;388(6744):322-328. doi: 10.1126/science.ads4706. Epub 2025 Apr 17.

Intracellular magnesium optimizes transmission efficiency and plasticity of hippocampal synapses by reconfiguring their connectivity.细胞内镁通过重新配置其连接来优化海马突触的传递效率和可塑性。

Nat Commun. 2024 Apr 22;15(1):3406. doi: 10.1038/s41467-024-47571-3.

Training spiking neuronal networks to perform motor control using reinforcement and evolutionary learning.利用强化学习和进化学习训练脉冲神经网络以执行运动控制。

Front Comput Neurosci. 2022 Sep 30;16:1017284. doi: 10.3389/fncom.2022.1017284. eCollection 2022.

Distributional Reinforcement Learning in the Brain.大脑中的分布强化学习。

Trends Neurosci. 2020 Dec;43(12):980-997. doi: 10.1016/j.tins.2020.09.004. Epub 2020 Oct 19.

Backpropagation and the brain.反向传播与大脑。

Nat Rev Neurosci. 2020 Jun;21(6):335-346. doi: 10.1038/s41583-020-0277-3. Epub 2020 Apr 17.

Spatio-Temporal Backpropagation for Training High-Performance Spiking Neural Networks.用于训练高性能脉冲神经网络的时空反向传播

Front Neurosci. 2018 May 23;12:331. doi: 10.3389/fnins.2018.00331. eCollection 2018.

Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation.平衡传播：弥合基于能量模型与反向传播之间的差距

Front Comput Neurosci. 2017 May 4;11:24. doi: 10.3389/fncom.2017.00024. eCollection 2017.

Rank Order Coding: a Retinal Information Decoding Strategy Revealed by Large-Scale Multielectrode Array Retinal Recordings.等级编码：大规模多电极阵列视网膜记录揭示的视网膜信息解码策略。

eNeuro. 2016 Jun 3;3(3). doi: 10.1523/ENEURO.0134-15.2016. eCollection 2016 May-Jun.

Unsupervised learning of digit recognition using spike-timing-dependent plasticity.使用基于脉冲时间依赖可塑性的无监督数字识别学习。

Front Comput Neurosci. 2015 Aug 3;9:99. doi: 10.3389/fncom.2015.00099. eCollection 2015.

Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.基于尖峰的连续状态和动作空间中的强化学习：当策略梯度方法失败时。

PLoS Comput Biol. 2009 Dec;5(12):e1000586. doi: 10.1371/journal.pcbi.1000586. Epub 2009 Dec 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用随机释放可塑性进行奖励优化学习。

Reward-optimizing learning using stochastic release plasticity.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献