Zhao Haoli, Wu Jiqiang, Li Zhenni, Chen Wuhui, Zheng Zibin
IEEE Trans Cybern. 2023 Feb;53(2):765-778. doi: 10.1109/TCYB.2022.3157892. Epub 2023 Jan 13.
Deep reinforcement learning (DRL), which highly depends on the data representation, has shown its potential in many practical decision-making problems. However, the process of acquiring representations in DRL is easily affected by interference from models, and moreover leaves unnecessary parameters, leading to control performance reduction. In this article, we propose a double sparse DRL via multilayer sparse coding and nonconvex regularized pruning. To alleviate interference in DRL, we propose a multilayer sparse-coding-structural network to obtain deep sparse representation for control in reinforcement learning. Furthermore, we employ a nonconvex log regularizer to promote strong sparsity, efficiently removing the unnecessary weights with a regularizer-based pruning scheme. Hence, a double sparse DRL algorithm is developed, which can not only learn deep sparse representation to reduce the interference but also remove redundant weights while keeping the robust performance. The experimental results in five benchmark environments of the deep q network (DQN) architecture demonstrate that the proposed method with deep sparse representations from the multilayer sparse-coding structure can outperform existing sparse-coding-based DRL in control, for example, completing Mountain Car with 140.81 steps, achieving near 10% reward increase from the single-layer sparse-coding DRL algorithm, and obtaining 286.08 scores in Catcher, which are over two times the rewards of the other algorithms. Moreover, the proposed algorithm can reduce over 80% parameters while keeping performance improvements from deep sparse representations.
深度强化学习(DRL)高度依赖于数据表示,已在许多实际决策问题中展现出其潜力。然而,DRL中获取表示的过程很容易受到模型干扰的影响,而且会留下不必要的参数,导致控制性能下降。在本文中,我们提出了一种通过多层稀疏编码和非凸正则化剪枝的双重稀疏DRL。为了减轻DRL中的干扰,我们提出了一种多层稀疏编码结构网络,以获得用于强化学习控制的深度稀疏表示。此外,我们采用非凸对数正则化器来促进强稀疏性,通过基于正则化器的剪枝方案有效地去除不必要的权重。因此,开发了一种双重稀疏DRL算法,它不仅可以学习深度稀疏表示以减少干扰,还能在保持鲁棒性能的同时去除冗余权重。在深度Q网络(DQN)架构的五个基准环境中的实验结果表明,所提出的具有来自多层稀疏编码结构的深度稀疏表示的方法在控制方面可以优于现有的基于稀疏编码的DRL,例如,以140.81步完成《山地车》游戏,比单层稀疏编码DRL算法的奖励提高近10%,在《接球手》游戏中获得286.08分,是其他算法奖励的两倍多。此外,所提出的算法可以减少超过80%的参数,同时保持深度稀疏表示带来的性能提升。