基于近似策略的加速深度强化学习

Approximate Policy-Based Accelerated Deep Reinforcement Learning.

作者信息

Wang Xuesong, Gu Yang, Cheng Yuhu, Liu Aiping, Chen C L Philip

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Jun;31(6):1820-1830. doi: 10.1109/TNNLS.2019.2927227. Epub 2019 Aug 6.

DOI:10.1109/TNNLS.2019.2927227

Abstract

In recent years, the deep reinforcement learning (DRL) algorithms have been developed rapidly and have achieved excellent performance in many challenging tasks. However, due to the complexity of network structure and a large amount of network parameters, the training of deep network is time-consuming, and consequently, the learning efficiency of DRL is limited. In this paper, aiming to speed up the learning process of DRL agent, we propose a novel approximate policy-based accelerated (APA) algorithm from the viewpoint of the error analysis of approximate policy iteration reinforcement learning algorithms. The proposed APA is proven to be convergent even with a more aggressive learning rate, making the DRL agent have a faster learning speed. Furthermore, to combine the accelerated algorithm with deep Q-network (DQN), Double DQN and deep deterministic policy gradient (DDPG), we proposed three novel DRL algorithms: APA-DQN, APA-Double DQN, and APA-DDPG, which demonstrates the adaptability of the accelerated algorithm with DRL algorithms. We have tested the proposed algorithms on both discrete-action and continuous-action tasks. Their superior performance demonstrates their great potential in the practical applications.

摘要

近年来，深度强化学习（DRL）算法发展迅速，并在许多具有挑战性的任务中取得了优异的性能。然而，由于网络结构的复杂性和大量的网络参数，深度网络的训练耗时较长，因此，DRL的学习效率受到限制。在本文中，为了加速DRL智能体的学习过程，我们从近似策略迭代强化学习算法的误差分析角度出发，提出了一种新颖的基于近似策略的加速（APA）算法。所提出的APA算法即使在采用更激进的学习率时也被证明是收敛的，这使得DRL智能体具有更快的学习速度。此外，为了将加速算法与深度Q网络（DQN）、双DQN和深度确定性策略梯度（DDPG）相结合，我们提出了三种新颖的DRL算法：APA-DQN、APA-双DQN和APA-DDPG，这证明了加速算法与DRL算法的适应性。我们已经在离散动作和连续动作任务上对所提出的算法进行了测试。它们的优越性能证明了它们在实际应用中的巨大潜力。