Cogn Neurodyn. 2010 Jun;4(2):91-105. doi: 10.1007/s11571-010-9109-x. Epub 2010 Mar 21.
Reinforcement learning is ubiquitous. Unlike other forms of learning, it involves the processing of fast yet content-poor feedback information to correct assumptions about the nature of a task or of a set of stimuli. This feedback information is often delivered as generic rewards or punishments, and has little to do with the stimulus features to be learned. How can such low-content feedback lead to such an efficient learning paradigm? Through a review of existing neuro-computational models of reinforcement learning, we suggest that the efficiency of this type of learning resides in the dynamic and synergistic cooperation of brain systems that use different levels of computations. The implementation of reward signals at the synaptic, cellular, network and system levels give the organism the necessary robustness, adaptability and processing speed required for evolutionary and behavioral success.
强化学习无处不在。与其他学习形式不同,它涉及快速但内容贫乏的反馈信息的处理,以纠正对任务性质或一组刺激的假设。这种反馈信息通常作为通用奖励或惩罚提供,与要学习的刺激特征几乎没有关系。如此低信息量的反馈怎么能导致如此高效的学习范例呢?通过对现有的强化学习神经计算模型的回顾,我们认为这种学习类型的效率在于使用不同计算水平的大脑系统的动态和协同合作。在突触、细胞、网络和系统水平上实现奖励信号,为生物体提供了进化和行为成功所需的必要鲁棒性、适应性和处理速度。