• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度强化学习中用于自监督探索的变分动力学

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning.

作者信息

Bai Chenjia, Liu Peng, Liu Kaiyu, Wang Lingxiao, Zhao Yingnan, Han Lei, Wang Zhaoran

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4776-4790. doi: 10.1109/TNNLS.2021.3129160. Epub 2023 Aug 4.

DOI:10.1109/TNNLS.2021.3129160
PMID:34851835
Abstract

Efficient exploration remains a challenging problem in reinforcement learning, especially for tasks where extrinsic rewards from environments are sparse or even totally disregarded. Significant advances based on intrinsic motivation show promising results in simple environments but often get stuck in environments with multimodal and stochastic dynamics. In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality and stochasticity. We consider the environmental state-action transition as a conditional generative process by generating the next-state prediction under the condition of the current state, action, and latent variable, which provides a better understanding of the dynamics and leads to a better performance in exploration. We derive an upper bound of the negative log likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration, which allows the agent to learn skills by self-supervised exploration without observing extrinsic rewards. We evaluate the proposed method on several image-based simulation tasks and a real robotic manipulating task. Our method outperforms several state-of-the-art environment model-based exploration approaches.

摘要

在强化学习中,高效探索仍然是一个具有挑战性的问题,特别是对于那些环境中的外部奖励稀疏甚至完全被忽略的任务。基于内在动机的重大进展在简单环境中显示出有希望的结果,但在具有多模态和随机动态的环境中往往会陷入困境。在这项工作中,我们提出了一种基于条件变分推理的变分动态模型,以对多模态和随机性进行建模。我们将环境状态 - 动作转换视为一个条件生成过程,通过在当前状态、动作和潜在变量的条件下生成下一状态预测,这有助于更好地理解动态,并在探索中带来更好的性能。我们推导了环境转换的负对数似然的上界,并将该上界用作探索的内在奖励,这使得智能体能够通过自我监督探索来学习技能,而无需观察外部奖励。我们在几个基于图像的模拟任务和一个真实的机器人操作任务上评估了所提出的方法。我们的方法优于几种基于环境模型的最先进探索方法。

相似文献

1
Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning.深度强化学习中用于自监督探索的变分动力学
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4776-4790. doi: 10.1109/TNNLS.2021.3129160. Epub 2023 Aug 4.
2
VASE: Variational Assorted Surprise Exploration for Reinforcement Learning.VASE:强化学习的变分分类惊喜探索
IEEE Trans Neural Netw Learn Syst. 2023 Mar;34(3):1243-1252. doi: 10.1109/TNNLS.2021.3105140. Epub 2023 Feb 28.
3
Strangeness-driven exploration in multi-agent reinforcement learning.多智能体强化学习中的奇异驱动探索。
Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.
4
LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.LJIR:在合作多智能体强化学习中学习联合行动内在奖励
Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.
5
End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation.端到端自主探索的深度强化学习和内在动机。
Comput Intell Neurosci. 2021 Dec 16;2021:9945044. doi: 10.1155/2021/9945044. eCollection 2021.
6
A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.强化学习算法通过划分任务空间从训练代理那里获取演示。
Neural Netw. 2023 Jul;164:419-427. doi: 10.1016/j.neunet.2023.04.042. Epub 2023 May 5.
7
WToE: Learning When to Explore in Multiagent Reinforcement Learning.
IEEE Trans Cybern. 2024 Aug;54(8):4789-4801. doi: 10.1109/TCYB.2023.3328732. Epub 2024 Jul 18.
8
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning.特征控制作为分层强化学习的内在动机。
IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3409-3418. doi: 10.1109/TNNLS.2019.2891792. Epub 2019 Jan 29.
9
Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information.通过最大化基于状态-动作的互信息在深度强化学习中发现多样的解决方案。
Neural Netw. 2022 Aug;152:90-104. doi: 10.1016/j.neunet.2022.04.009. Epub 2022 Apr 16.
10
Variational Information Bottleneck Regularized Deep Reinforcement Learning for Efficient Robotic Skill Adaptation.变分信息瓶颈正则化深度强化学习在机器人高效技能自适应中的应用。
Sensors (Basel). 2023 Jan 9;23(2):762. doi: 10.3390/s23020762.