文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于演员学习的多源转移双 DQN。

Multisource Transfer Double DQN Based on Actor Learning.

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2227-2238. doi: 10.1109/TNNLS.2018.2806087.


DOI:10.1109/TNNLS.2018.2806087
PMID:29771674
Abstract

Deep reinforcement learning (RL) comprehensively uses the psychological mechanisms of "trial and error" and "reward and punishment" in RL as well as powerful feature expression and nonlinear mapping in deep learning. Currently, it plays an essential role in the fields of artificial intelligence and machine learning. Since an RL agent needs to constantly interact with its surroundings, the deep Q network (DQN) is inevitably faced with the need to learn numerous network parameters, which results in low learning efficiency. In this paper, a multisource transfer double DQN (MTDDQN) based on actor learning is proposed. The transfer learning technique is integrated with deep RL to make the RL agent collect, summarize, and transfer action knowledge, including policy mimic and feature regression, to the training of related tasks. There exists action overestimation in DQN, i.e., the lower probability limit of action corresponding to the maximum Q value is nonzero. Therefore, the transfer network is trained by using double DQN to eliminate the error accumulation caused by action overestimation. In addition, to avoid negative transfer, i.e., to ensure strong correlations between source and target tasks, a multisource transfer learning mechanism is applied. The Atari2600 game is tested on the arcade learning environment platform to evaluate the feasibility and performance of MTDDQN by comparing it with some mainstream approaches, such as DQN and double DQN. Experiments prove that MTDDQN achieves not only human-like actor learning transfer capability, but also the desired learning efficiency and testing accuracy on target task.

摘要

深度强化学习(RL)全面利用 RL 中的“试错”和“奖惩”心理机制以及深度学习中的强大特征表达和非线性映射。目前,它在人工智能和机器学习领域发挥着重要作用。由于 RL 代理需要不断与环境交互,深度 Q 网络(DQN)不可避免地需要学习大量的网络参数,这导致学习效率低下。在本文中,提出了一种基于演员学习的多源迁移双 DQN(MTDDQN)。迁移学习技术与深度 RL 相结合,使 RL 代理能够收集、总结和转移动作知识,包括策略模仿和特征回归,以训练相关任务。DQN 中存在动作高估的问题,即对应于最大 Q 值的动作的较低概率极限不为零。因此,通过使用双 DQN 训练迁移网络,以消除动作高估引起的误差积累。此外,为了避免负迁移,即确保源任务和目标任务之间具有强相关性,应用了多源迁移学习机制。通过在 Arcade 学习环境平台上测试 Atari2600 游戏,将 MTDDQN 与 DQN 和双 DQN 等一些主流方法进行比较,评估其可行性和性能。实验证明,MTDDQN 不仅实现了类似于人类的演员学习迁移能力,而且在目标任务上达到了预期的学习效率和测试精度。

相似文献

[1]
Multisource Transfer Double DQN Based on Actor Learning.

IEEE Trans Neural Netw Learn Syst. 2018-6

[2]
Deep reinforcement learning for automated radiation adaptation in lung cancer.

Med Phys. 2017-11-14

[3]
Integrated Double Estimator Architecture for Reinforcement Learning.

IEEE Trans Cybern. 2022-5

[4]
Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning.

Front Neurorobot. 2019-12-10

[5]
Recognition of Hand Gestures Based on EMG Signals with Deep and Double-Deep Q-Networks.

Sensors (Basel). 2023-4-12

[6]
Approximate Policy-Based Accelerated Deep Reinforcement Learning.

IEEE Trans Neural Netw Learn Syst. 2020-6

[7]
Application of Deep Reinforcement Learning to NS-SHAFT Game Signal Control.

Sensors (Basel). 2022-7-14

[8]
Learning to Predict Consequences as a Method of Knowledge Transfer in Reinforcement Learning.

IEEE Trans Neural Netw Learn Syst. 2017-4-17

[9]
RL-Chord: CLSTM-Based Melody Harmonization Using Deep Reinforcement Learning.

IEEE Trans Neural Netw Learn Syst. 2024-8

[10]
Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.

IEEE Trans Neural Netw Learn Syst. 2022-5

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索