IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2227-2238. doi: 10.1109/TNNLS.2018.2806087.
Deep reinforcement learning (RL) comprehensively uses the psychological mechanisms of "trial and error" and "reward and punishment" in RL as well as powerful feature expression and nonlinear mapping in deep learning. Currently, it plays an essential role in the fields of artificial intelligence and machine learning. Since an RL agent needs to constantly interact with its surroundings, the deep Q network (DQN) is inevitably faced with the need to learn numerous network parameters, which results in low learning efficiency. In this paper, a multisource transfer double DQN (MTDDQN) based on actor learning is proposed. The transfer learning technique is integrated with deep RL to make the RL agent collect, summarize, and transfer action knowledge, including policy mimic and feature regression, to the training of related tasks. There exists action overestimation in DQN, i.e., the lower probability limit of action corresponding to the maximum Q value is nonzero. Therefore, the transfer network is trained by using double DQN to eliminate the error accumulation caused by action overestimation. In addition, to avoid negative transfer, i.e., to ensure strong correlations between source and target tasks, a multisource transfer learning mechanism is applied. The Atari2600 game is tested on the arcade learning environment platform to evaluate the feasibility and performance of MTDDQN by comparing it with some mainstream approaches, such as DQN and double DQN. Experiments prove that MTDDQN achieves not only human-like actor learning transfer capability, but also the desired learning efficiency and testing accuracy on target task.
深度强化学习(RL)全面利用 RL 中的“试错”和“奖惩”心理机制以及深度学习中的强大特征表达和非线性映射。目前,它在人工智能和机器学习领域发挥着重要作用。由于 RL 代理需要不断与环境交互,深度 Q 网络(DQN)不可避免地需要学习大量的网络参数,这导致学习效率低下。在本文中,提出了一种基于演员学习的多源迁移双 DQN(MTDDQN)。迁移学习技术与深度 RL 相结合,使 RL 代理能够收集、总结和转移动作知识,包括策略模仿和特征回归,以训练相关任务。DQN 中存在动作高估的问题,即对应于最大 Q 值的动作的较低概率极限不为零。因此,通过使用双 DQN 训练迁移网络,以消除动作高估引起的误差积累。此外,为了避免负迁移,即确保源任务和目标任务之间具有强相关性,应用了多源迁移学习机制。通过在 Arcade 学习环境平台上测试 Atari2600 游戏,将 MTDDQN 与 DQN 和双 DQN 等一些主流方法进行比较,评估其可行性和性能。实验证明,MTDDQN 不仅实现了类似于人类的演员学习迁移能力,而且在目标任务上达到了预期的学习效率和测试精度。
IEEE Trans Neural Netw Learn Syst. 2018-6
Med Phys. 2017-11-14
IEEE Trans Cybern. 2022-5
Front Neurorobot. 2019-12-10
Sensors (Basel). 2023-4-12
IEEE Trans Neural Netw Learn Syst. 2020-6
Sensors (Basel). 2022-7-14
IEEE Trans Neural Netw Learn Syst. 2017-4-17
IEEE Trans Neural Netw Learn Syst. 2024-8
IEEE Trans Neural Netw Learn Syst. 2022-5