Department of Brain Robot Interface, ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seikacho, Soraku-gun, Kyoto 619-0288, Japan; Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan.
Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan.
Neural Netw. 2016 Dec;84:17-27. doi: 10.1016/j.neunet.2016.07.013. Epub 2016 Aug 26.
Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions.
基于自由能的强化学习(FERL)被提出用于学习高维状态和动作空间。然而,FERL 方法仅在二进制或接近二进制的状态输入下真正有效,其中活跃状态的数量少于非活跃状态的数量。在 FERL 方法中,值函数通过受限玻尔兹曼机(RBM)的负自由能来近似。在我们之前的研究中,我们证明了通过将自由能乘以与网络大小相关的常数,可以提高 FERL 方法的性能和鲁棒性。在这项研究中,我们提出通过使用负期望能量(EERL)来近似值函数,而不是负自由能,以及能够处理连续状态输入,可以进一步改进 RBM 函数逼近。我们通过证明 EERL:(1)在具有高维图像状态输入的三个网格世界任务版本中,优于 FERL 以及标准神经网络和线性函数逼近;(2)在无模型和基于模型的学习设置下,在随机 SZ-Tetris 中达到新的最先进结果;(3)在具有原始和嘈杂 RGB 图像作为状态输入和大量动作的机器人导航任务中,显著优于 FERL 和标准神经网络函数逼近,验证了我们的方法。