Suppr超能文献

用于离策略演员-评论家的元注意力机制

Meta attention for Off-Policy Actor-Critic.

作者信息

Huang Jiateng, Huang Wanrong, Lan Long, Wu Dan

机构信息

National University of Defense Technology, College of Computer Science and Technology, Institute for Quantum Information & State Key Laboratory of High Performance Computing, Changsha, 410073, Hunan, China.

National University of Defense Technology, College of Computer Science and Technology, Institute for Quantum Information & State Key Laboratory of High Performance Computing, Changsha, 410073, Hunan, China.

出版信息

Neural Netw. 2023 Jun;163:86-96. doi: 10.1016/j.neunet.2023.03.024. Epub 2023 Mar 28.

Abstract

Off-Policy Actor-Critic methods can effectively exploit past experiences and thus they have achieved great success in various reinforcement learning tasks. In many image-based and multi-agent tasks, attention mechanism has been employed in Actor-Critic methods to improve their sampling efficiency. In this paper, we propose a meta attention method for state-based reinforcement learning tasks, which combines attention mechanism and meta-learning based on the Off-Policy Actor-Critic framework. Unlike previous attention-based work, our meta attention method introduces attention in the Actor and the Critic of the typical Actor-Critic framework, rather than in multiple pixels of an image or multiple information sources in specific image-based control tasks or multi-agent systems. In contrast to existing meta-learning methods, the proposed meta-attention approach is able to function in both the gradient-based training phase and the agent's decision-making process. The experimental results demonstrate the superiority of our meta-attention method in various continuous control tasks, which are based on the Off-Policy Actor-Critic methods including DDPG and TD3.

摘要

离策略演员-评论家方法能够有效地利用过去的经验,因此在各种强化学习任务中取得了巨大成功。在许多基于图像的任务和多智能体任务中,演员-评论家方法采用了注意力机制来提高其采样效率。在本文中,我们针对基于状态的强化学习任务提出了一种元注意力方法,该方法基于离策略演员-评论家框架,将注意力机制和元学习相结合。与以往基于注意力的工作不同,我们的元注意力方法在典型演员-评论家框架的演员和评论家中引入注意力,而不是在特定基于图像的控制任务或多智能体系统的图像的多个像素或多个信息源中引入注意力。与现有的元学习方法相比,所提出的元注意力方法能够在基于梯度的训练阶段和智能体的决策过程中发挥作用。实验结果证明了我们的元注意力方法在各种连续控制任务中的优越性,这些任务基于包括深度确定性策略梯度(DDPG)和双延迟深度确定性策略梯度(TD3)在内的离策略演员-评论家方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验