Sun Yuhao, Liao Wantong, Li Jinhao, Zhang Xinche, Wang Guan, Ma Zhiyuan, Song Sen
Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China.
School of Biomedical Engineering, Tsinghua University, Beijing, China.
Front Neural Circuits. 2025 Aug 14;19:1618506. doi: 10.3389/fncir.2025.1618506. eCollection 2025.
Synaptic plasticity underlies adaptive learning in neural systems, offering a biologically plausible framework for reward-driven learning. However, a question remains: how can plasticity rules achieve robustness and effectiveness comparable to error backpropagation? In this study, we introduce Reward-Optimized Stochastic Release Plasticity (RSRP), a learning framework where synaptic release is modeled as a parameterized distribution. Utilizing natural gradient estimation, we derive a synaptic plasticity learning rule that effectively adapts to maximize reward signals. Our approach achieves competitive performance and demonstrates stability in reinforcement learning, comparable to Proximal Policy Optimization (PPO), while attaining accuracy comparable with error backpropagation in digit classification. Additionally, we identify reward regularization as a key stabilizing mechanism and validate our method in biologically plausible networks. Our findings suggest that RSRP offers a robust and effective plasticity learning rule, especially in a discontinuous reinforcement learning paradigm, with potential implications for both artificial intelligence and experimental neuroscience.
突触可塑性是神经系统适应性学习的基础,为奖励驱动学习提供了一个生物学上合理的框架。然而,一个问题仍然存在:可塑性规则如何实现与误差反向传播相当的鲁棒性和有效性?在本研究中,我们引入了奖励优化随机释放可塑性(RSRP),这是一种学习框架,其中突触释放被建模为参数化分布。利用自然梯度估计,我们推导出一种突触可塑性学习规则,该规则能有效适应以最大化奖励信号。我们的方法实现了具有竞争力的性能,并在强化学习中表现出稳定性,与近端策略优化(PPO)相当,同时在数字分类中达到了与误差反向传播相当的准确率。此外,我们确定奖励正则化是一种关键的稳定机制,并在生物学上合理的网络中验证了我们的方法。我们的研究结果表明,RSRP提供了一种鲁棒且有效的可塑性学习规则,特别是在不连续强化学习范式中,对人工智能和实验神经科学都有潜在影响。