• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于回报的深度 Q 网络的政策差异的定性测量。

Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network.

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):4374-4380. doi: 10.1109/TNNLS.2019.2948892. Epub 2019 Nov 22.

DOI:10.1109/TNNLS.2019.2948892
PMID:31765320
Abstract

The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. The DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use of sample trajectories. In this brief, we propose a general framework to combine the DQN and most of the return-based reinforcement learning algorithms, named R-DQN. We show that the performance of the traditional DQN can be significantly improved by introducing return-based algorithms. In order to further improve the R-DQN, we design a strategy with two measurements to qualitatively measure the policy discrepancy. We conduct experiments on several representative tasks from the OpenAI Gym and Atari games. The state-of-the-art performance achieved by our method with this proposed strategy validates its effectiveness.

摘要

深度 Q 网络(DQN)和基于回报的强化学习是近年来提出的两种很有前途的算法。DQN 为复杂的顺序决策问题带来了进步,而基于回报的算法则在利用样本轨迹方面具有优势。在本简讯中,我们提出了一个将 DQN 和大多数基于回报的强化学习算法相结合的通用框架,名为 R-DQN。我们表明,通过引入基于回报的算法,传统的 DQN 的性能可以得到显著提高。为了进一步提高 R-DQN 的性能,我们设计了一种具有两个度量的策略,定性地衡量策略差异。我们在 OpenAI Gym 和 Atari 游戏中的几个有代表性的任务上进行了实验。我们提出的策略的方法所达到的最先进的性能验证了其有效性。

相似文献

1
Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network.基于回报的深度 Q 网络的政策差异的定性测量。
IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):4374-4380. doi: 10.1109/TNNLS.2019.2948892. Epub 2019 Nov 22.
2
Minibatch Recursive Least Squares Q-Learning.小批量递归最小二乘 Q 学习。
Comput Intell Neurosci. 2021 Oct 8;2021:5370281. doi: 10.1155/2021/5370281. eCollection 2021.
3
Approximate Policy-Based Accelerated Deep Reinforcement Learning.基于近似策略的加速深度强化学习
IEEE Trans Neural Netw Learn Syst. 2020 Jun;31(6):1820-1830. doi: 10.1109/TNNLS.2019.2927227. Epub 2019 Aug 6.
4
Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning.受限深度Q学习逐步逼近普通Q学习。
Front Neurorobot. 2019 Dec 10;13:103. doi: 10.3389/fnbot.2019.00103. eCollection 2019.
5
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.在强化学习中用于神经网络函数逼近的 Sigmoid 加权线性单元。
Neural Netw. 2018 Nov;107:3-11. doi: 10.1016/j.neunet.2017.12.012. Epub 2018 Jan 11.
6
MonkeyKing: Adaptive Parameter Tuning on Big Data Platforms with Deep Reinforcement Learning.孙悟空:基于深度强化学习的大数据平台自适应参数调整。
Big Data. 2020 Aug;8(4):270-290. doi: 10.1089/big.2019.0123. Epub 2020 Jul 10.
7
Deep reinforcement learning for automated radiation adaptation in lung cancer.深度强化学习在肺癌放射自适应中的应用。
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
8
Multisource Transfer Double DQN Based on Actor Learning.基于演员学习的多源转移双 DQN。
IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2227-2238. doi: 10.1109/TNNLS.2018.2806087.
9
Teleconsultation dynamic scheduling with a deep reinforcement learning approach.基于深度强化学习的远程会诊动态调度。
Artif Intell Med. 2024 Mar;149:102806. doi: 10.1016/j.artmed.2024.102806. Epub 2024 Feb 9.
10
Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.深度强化学习中的自定步调和带覆盖惩罚的优先级课程学习。
IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2216-2226. doi: 10.1109/TNNLS.2018.2790981.