Suppr超能文献

基于回报的深度 Q 网络的政策差异的定性测量。

Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network.

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):4374-4380. doi: 10.1109/TNNLS.2019.2948892. Epub 2019 Nov 22.

Abstract

The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. The DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use of sample trajectories. In this brief, we propose a general framework to combine the DQN and most of the return-based reinforcement learning algorithms, named R-DQN. We show that the performance of the traditional DQN can be significantly improved by introducing return-based algorithms. In order to further improve the R-DQN, we design a strategy with two measurements to qualitatively measure the policy discrepancy. We conduct experiments on several representative tasks from the OpenAI Gym and Atari games. The state-of-the-art performance achieved by our method with this proposed strategy validates its effectiveness.

摘要

深度 Q 网络(DQN)和基于回报的强化学习是近年来提出的两种很有前途的算法。DQN 为复杂的顺序决策问题带来了进步,而基于回报的算法则在利用样本轨迹方面具有优势。在本简讯中,我们提出了一个将 DQN 和大多数基于回报的强化学习算法相结合的通用框架,名为 R-DQN。我们表明,通过引入基于回报的算法,传统的 DQN 的性能可以得到显著提高。为了进一步提高 R-DQN 的性能,我们设计了一种具有两个度量的策略,定性地衡量策略差异。我们在 OpenAI Gym 和 Atari 游戏中的几个有代表性的任务上进行了实验。我们提出的策略的方法所达到的最先进的性能验证了其有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验