Suppr超能文献

基于动态稀疏编码的深度强化学习价值估计网络。

Dynamic sparse coding-based value estimation network for deep reinforcement learning.

机构信息

School of Automation, Guangdong University of Technology, Guangzhou, 510006, China; Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou 510006, China.

School of Automation, Guangdong University of Technology, Guangzhou, 510006, China; 111 Center for Intelligent Batch Manufacturing Based on IoT Technology (GDUT), Guangzhou, 510006, China.

出版信息

Neural Netw. 2023 Nov;168:180-193. doi: 10.1016/j.neunet.2023.09.013. Epub 2023 Sep 11.

Abstract

Deep Reinforcement Learning (DRL) is one powerful tool for varied control automation problems. Performances of DRL highly depend on the accuracy of value estimation for states from environments. However, the Value Estimation Network (VEN) in DRL can be easily influenced by the phenomenon of catastrophic interference from environments and training. In this paper, we propose a Dynamic Sparse Coding-based (DSC) VEN model to obtain precise sparse representations for accurate value prediction and sparse parameters for efficient training, which is not only applicable in Q-learning structured discrete-action DRL but also in actor-critic structured continuous-action DRL. In detail, to alleviate interference in VEN, we propose to employ DSC to learn sparse representations for accurate value estimation with dynamic gradients beyond the conventional ℓ norm that provides same-value gradients. To avoid influences from redundant parameters, we employ DSC to prune weights with dynamic thresholds more efficiently than static thresholds like ℓ norm. Experiments demonstrate that the proposed algorithms with dynamic sparse coding can obtain higher control performances than existing benchmark DRL algorithms in both discrete-action and continuous-action environments, e.g., over 25% increase in Puddle World and about 10% increase in Hopper. Moreover, the proposed algorithm can reach convergence efficiently with fewer episodes in different environments.

摘要

深度强化学习 (DRL) 是解决各种控制自动化问题的有力工具之一。DRL 的性能高度依赖于从环境中对状态进行准确的价值估计。然而,DRL 中的价值估计网络 (VEN) 很容易受到环境和训练的灾难性干扰现象的影响。在本文中,我们提出了一种基于动态稀疏编码的 (DSC) VEN 模型,以获得精确的稀疏表示,从而进行准确的价值预测和稀疏参数,以实现高效的训练,不仅适用于 Q-learning 结构的离散动作 DRL,也适用于 actor-critic 结构的连续动作 DRL。具体来说,为了缓解 VEN 中的干扰,我们提出采用 DSC 学习稀疏表示,以获得更准确的价值估计,超越传统的ℓ范数提供的同值梯度的动态梯度。为了避免冗余参数的影响,我们采用 DSC 以比ℓ范数等静态阈值更有效地修剪权重,采用动态阈值。实验表明,在离散动作和连续动作环境中,与现有的基准 DRL 算法相比,采用动态稀疏编码的算法可以获得更高的控制性能,例如,在 Puddle World 中增加了 25%以上,在 Hopper 中增加了约 10%。此外,该算法可以在不同环境中用更少的回合数达到收敛。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验