Khorsand Peyman, Soltani Alireza
Department of Psychological and Brain Sciences, Dartmouth College, New Hampshire, United States of America.
PLoS Comput Biol. 2017 Jun 28;13(6):e1005630. doi: 10.1371/journal.pcbi.1005630. eCollection 2017 Jun.
Learning from reward feedback in a changing environment requires a high degree of adaptability, yet the precise estimation of reward information demands slow updates. In the framework of estimating reward probability, here we investigated how this tradeoff between adaptability and precision can be mitigated via metaplasticity, i.e. synaptic changes that do not always alter synaptic efficacy. Using the mean-field and Monte Carlo simulations we identified 'superior' metaplastic models that can substantially overcome the adaptability-precision tradeoff. These models can achieve both adaptability and precision by forming two separate sets of meta-states: reservoirs and buffers. Synapses in reservoir meta-states do not change their efficacy upon reward feedback, whereas those in buffer meta-states can change their efficacy. Rapid changes in efficacy are limited to synapses occupying buffers, creating a bottleneck that reduces noise without significantly decreasing adaptability. In contrast, more-populated reservoirs can generate a strong signal without manifesting any observable plasticity. By comparing the behavior of our model and a few competing models during a dynamic probability estimation task, we found that superior metaplastic models perform close to optimally for a wider range of model parameters. Finally, we found that metaplastic models are robust to changes in model parameters and that metaplastic transitions are crucial for adaptive learning since replacing them with graded plastic transitions (transitions that change synaptic efficacy) reduces the ability to overcome the adaptability-precision tradeoff. Overall, our results suggest that ubiquitous unreliability of synaptic changes evinces metaplasticity that can provide a robust mechanism for mitigating the tradeoff between adaptability and precision and thus adaptive learning.
在不断变化的环境中从奖励反馈中学习需要高度的适应性,然而奖励信息的精确估计需要缓慢更新。在估计奖励概率的框架中,我们研究了如何通过元可塑性(即并不总是改变突触效能的突触变化)来缓解适应性和精确性之间的这种权衡。使用平均场和蒙特卡罗模拟,我们确定了“优越”的元可塑性模型,这些模型可以显著克服适应性 - 精确性的权衡。这些模型可以通过形成两组独立的元状态:储存库和缓冲器来实现适应性和精确性。处于储存库元状态的突触在奖励反馈时不会改变其效能,而处于缓冲器元状态的突触可以改变其效能。效能的快速变化仅限于占据缓冲器的突触,从而形成一个瓶颈,减少噪声而不会显著降低适应性。相比之下,更多的储存库可以产生强烈的信号而不表现出任何可观察到的可塑性。通过比较我们的模型和一些竞争模型在动态概率估计任务中的行为,我们发现优越的元可塑性模型在更广泛的模型参数范围内表现接近最优。最后,我们发现元可塑性模型对模型参数的变化具有鲁棒性,并且元可塑性转变对于适应性学习至关重要,因为用分级可塑性转变(改变突触效能的转变)取代它们会降低克服适应性 - 精确性权衡的能力。总体而言,我们的结果表明,突触变化普遍存在的不可靠性表明元可塑性可以提供一种强大的机制来缓解适应性和精确性之间的权衡,从而实现适应性学习。