Wiering Marco A, van Hasselt Hado
Department of Artificial Intelligence, University of Groningen, 9400 AK Groningen, The Netherlands.
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):930-6. doi: 10.1109/TSMCB.2008.920231.
This paper describes several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent. The aim is to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms. We designed and implemented four different ensemble methods combining the following five different RL algorithms: Q-learning, Sarsa, actor-critic (AC), QV-learning, and AC learning automaton. The intuitively designed ensemble methods, namely, majority voting (MV), rank voting, Boltzmann multiplication (BM), and Boltzmann addition, combine the policies derived from the value functions of the different RL algorithms, in contrast to previous work where ensemble methods have been used in RL for representing and learning a single value function. We show experiments on five maze problems of varying complexity; the first problem is simple, but the other four maze tasks are of a dynamic or partially observable nature. The results indicate that the BM and MV ensembles significantly outperform the single RL algorithms.
本文描述了几种在单个智能体中结合多种不同强化学习(RL)算法的集成方法。目的是通过结合不同RL算法的选定动作或动作概率来提高学习速度和最终性能。我们设计并实现了四种不同的集成方法,它们结合了以下五种不同的RL算法:Q学习、Sarsa、演员-评论家(AC)、QV学习和AC学习自动机。直观设计的集成方法,即多数投票(MV)、排名投票、玻尔兹曼乘法(BM)和玻尔兹曼加法,结合了从不同RL算法的值函数导出的策略,这与之前在RL中使用集成方法来表示和学习单个值函数的工作形成对比。我们展示了在五个不同复杂度的迷宫问题上的实验;第一个问题很简单,但其他四个迷宫任务具有动态或部分可观察的性质。结果表明,BM和MV集成方法明显优于单个RL算法。