Kawato Mitsuo, Samejima Kazuyuki
ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan.
Curr Opin Neurobiol. 2007 Apr;17(2):205-12. doi: 10.1016/j.conb.2007.03.004. Epub 2007 Mar 19.
Reinforcement learning algorithms have provided some of the most influential computational theories for behavioral learning that depends on reward and penalty. After briefly reviewing supporting experimental data, this paper tackles three difficult theoretical issues that remain to be explored. First, plain reinforcement learning is much too slow to be considered a plausible brain model. Second, although the temporal-difference error has an important role both in theory and in experiments, how to compute it remains an enigma. Third, function of all brain areas, including the cerebral cortex, cerebellum, brainstem and basal ganglia, seems to necessitate a new computational framework. Computational studies that emphasize meta-parameters, hierarchy, modularity and supervised learning to resolve these issues are reviewed here, together with the related experimental data.
强化学习算法为依赖奖惩的行为学习提供了一些最具影响力的计算理论。在简要回顾支持性实验数据之后,本文探讨了三个有待探索的困难理论问题。首先,单纯的强化学习速度太慢,难以被视为一种合理的大脑模型。其次,尽管时间差分误差在理论和实验中都起着重要作用,但如何计算它仍然是个谜。第三,包括大脑皮层、小脑、脑干和基底神经节在内的所有脑区的功能,似乎都需要一个新的计算框架。本文回顾了强调元参数、层次结构、模块化和监督学习以解决这些问题的计算研究,以及相关的实验数据。