基于任务信息的元强化学习探索

Exploration With Task Information for Meta Reinforcement Learning.

作者信息

Jiang Peng, Song Shiji, Huang Gao

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4033-4046. doi: 10.1109/TNNLS.2021.3121432. Epub 2023 Aug 4.

DOI:10.1109/TNNLS.2021.3121432

Abstract

Meta reinforcement learning (meta-RL) is a promising technique for fast task adaptation by leveraging prior knowledge from previous tasks. Recently, context-based meta-RL has been proposed to improve data efficiency by applying a principled framework, dividing the learning procedure into task inference and task execution. However, the task information is not adequately leveraged in this approach, thus leading to inefficient exploration. To address this problem, we propose a novel context-based meta-RL framework with an improved exploration mechanism. For the existing exploration and execution problem in context-based meta-RL, we propose a novel objective that employs two exploration terms to encourage better exploration in action and task embedding space, respectively. The first term pushes for improving the diversity of task inference, while the second term, named action information, works as sharing or hiding task information in different exploration stages. We divide the meta-training procedure into task-independent exploration and task-relevant exploration stages according to the utilization of action information. By decoupling task inference and task execution and proposing the respective optimization objectives in the two exploration stages, we can efficiently learn policy and task inference networks. We compare our algorithm with several popular meta-RL methods on MuJoco benchmarks with both dense and sparse reward settings. The empirical results show that our method significantly outperforms baselines on the benchmarks in terms of sample efficiency and task performance.

摘要

元强化学习（meta-RL）是一种很有前途的技术，可通过利用先前任务的先验知识来实现快速任务适应。最近，基于上下文的元强化学习被提出来，通过应用一个有原则的框架来提高数据效率，将学习过程分为任务推理和任务执行。然而，这种方法没有充分利用任务信息，从而导致探索效率低下。为了解决这个问题，我们提出了一种具有改进探索机制的新型基于上下文的元强化学习框架。针对基于上下文的元强化学习中现有的探索和执行问题，我们提出了一个新颖的目标，该目标采用两个探索项，分别鼓励在动作和任务嵌入空间中进行更好的探索。第一个项推动提高任务推理的多样性，而第二个项，称为动作信息，在不同的探索阶段起到共享或隐藏任务信息的作用。我们根据动作信息的利用情况，将元训练过程分为与任务无关的探索阶段和与任务相关的探索阶段。通过解耦任务推理和任务执行，并在两个探索阶段提出各自的优化目标，我们可以有效地学习策略和任务推理网络。我们在具有密集和稀疏奖励设置的MuJoco基准上，将我们的算法与几种流行的元强化学习方法进行了比较。实证结果表明，我们的方法在样本效率和任务性能方面显著优于基准方法。

相似文献

Exploration With Task Information for Meta Reinforcement Learning.基于任务信息的元强化学习探索

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4033-4046. doi: 10.1109/TNNLS.2021.3121432. Epub 2023 Aug 4.

Context-Based Meta-Reinforcement Learning With Bayesian Nonparametric Models.基于上下文的贝叶斯非参数模型元强化学习

IEEE Trans Pattern Anal Mach Intell. 2024 Oct;46(10):6948-6965. doi: 10.1109/TPAMI.2024.3386780. Epub 2024 Sep 5.

Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments.非平稳和非参数环境中的元强化学习

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13604-13618. doi: 10.1109/TNNLS.2023.3270298. Epub 2024 Oct 7.

Meta-Reinforcement Learning With Dynamic Adaptiveness Distillation.基于动态适应性蒸馏的元强化学习

IEEE Trans Neural Netw Learn Syst. 2023 Mar;34(3):1454-1464. doi: 10.1109/TNNLS.2021.3105407. Epub 2023 Feb 28.

A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.强化学习算法通过划分任务空间从训练代理那里获取演示。

Neural Netw. 2023 Jul;164:419-427. doi: 10.1016/j.neunet.2023.04.042. Epub 2023 May 5.

Distributional Policy Gradient With Distributional Value Function.基于分布值函数的分布策略梯度

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6556-6568. doi: 10.1109/TNNLS.2024.3386225. Epub 2025 Apr 4.

Boosting Reinforcement Learning via Hierarchical Game Playing With State Relay.通过带有状态中继的分层博弈来增强强化学习

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7077-7089. doi: 10.1109/TNNLS.2024.3386717. Epub 2025 Apr 4.

Meta-Reinforcement Learning in Non-Stationary and Dynamic Environments.元强化学习在非平稳和动态环境中的应用。

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3476-3491. doi: 10.1109/TPAMI.2022.3185549. Epub 2023 Feb 3.

Learning robotic manipulation skills with multiple semantic goals by conservative curiosity-motivated exploration.通过保守的好奇心驱动探索学习具有多个语义目标的机器人操作技能。

Front Neurorobot. 2023 Mar 7;17:1089270. doi: 10.3389/fnbot.2023.1089270. eCollection 2023.

Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction.基于贝叶斯网络的知识抽取的高效示教强化学习。

Comput Intell Neurosci. 2021 Sep 24;2021:7588221. doi: 10.1155/2021/7588221. eCollection 2021.

基于任务信息的元强化学习探索

Exploration With Task Information for Meta Reinforcement Learning.

作者信息

Jiang Peng, Song Shiji, Huang Gao

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4033-4046. doi: 10.1109/TNNLS.2021.3121432. Epub 2023 Aug 4.

DOI:10.1109/TNNLS.2021.3121432

PMID:34739382

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于任务信息的元强化学习探索

Exploration With Task Information for Meta Reinforcement Learning.

作者信息

出版信息

相似文献

基于任务信息的元强化学习探索

Exploration With Task Information for Meta Reinforcement Learning.

作者信息

出版信息

相似文献