IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2216-2226. doi: 10.1109/TNNLS.2018.2790981.
In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.
在本文中,我们提出了一种新的深度强化学习训练范例,该范例使用自定步长优先课程学习和覆盖惩罚进行强化学习。所提出的深度课程强化学习(DCRL)通过自适应地从重放存储器中选择适当的转换,充分利用了经验重放的优势,该重放存储器基于每个转换的复杂性。DCRL 中的复杂度标准包括自定步长优先级和覆盖惩罚。自定步长优先级反映了时间差异误差与当前课程难度之间的关系,以提高样本效率。考虑到样本多样性,引入了覆盖惩罚。通过与深度 Q 网络(DQN)和优先经验重放(PER)方法的比较,在 Atari 2600 游戏上对 DCRL 算法进行了评估,实验结果表明,DCRL 在大多数这些游戏中都优于 DQN 和 PER。更多的结果进一步表明,所提出的 DCRL 课程训练范例也适用于其他基于记忆的深度强化学习方法,如双 DQN 和决斗网络。所有实验结果都表明,DCRL 可以提高深度强化学习的训练效率和鲁棒性。