Ishii Shin, Yoshida Wako, Yoshimoto Junichiro
Nara Institute of Science and Technology, Ikoma, Japan.
Neural Netw. 2002 Jun-Jul;15(4-6):665-87. doi: 10.1016/s0893-6080(02)00056-4.
In reinforcement learning (RL), the duality between exploitation and exploration has long been an important issue. This paper presents a new method that controls the balance between exploitation and exploration. Our learning scheme is based on model-based RL, in which the Bayes inference with forgetting effect estimates the state-transition probability of the environment. The balance parameter, which corresponds to the randomness in action selection, is controlled based on variation of action results and perception of environmental change. When applied to maze tasks, our method successfully obtains good controls by adapting to environmental changes. Recently, Usher et al. [Science 283 (1999) 549] has suggested that noradrenergic neurons in the locus coeruleus may control the exploitation-exploration balance in a real brain and that the balance may correspond to the level of animal's selective attention. According to this scenario, we also discuss a possible implementation in the brain.
在强化学习(RL)中,利用与探索之间的二元性长期以来一直是一个重要问题。本文提出了一种控制利用与探索之间平衡的新方法。我们的学习方案基于基于模型的强化学习,其中具有遗忘效应的贝叶斯推理估计环境的状态转移概率。对应于动作选择中的随机性的平衡参数基于动作结果的变化和对环境变化的感知来控制。当应用于迷宫任务时,我们的方法通过适应环境变化成功获得了良好的控制。最近,厄舍等人[《科学》283(1999)549]提出,蓝斑中的去甲肾上腺素能神经元可能控制真实大脑中的利用-探索平衡,并且这种平衡可能对应于动物的选择性注意水平。根据这种情况,我们还讨论了在大脑中的一种可能实现方式。