• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无监督技能发现中的平衡状态探索与技能多样性

Balancing State Exploration and Skill Diversity in Unsupervised Skill Discovery.

作者信息

Liu Xin, Chen Yaran, Chen Guixing, Li Haoran, Zhao Dongbin

出版信息

IEEE Trans Cybern. 2025 May;55(5):2234-2247. doi: 10.1109/TCYB.2025.3548821. Epub 2025 Apr 23.

DOI:10.1109/TCYB.2025.3548821
PMID:40138236
Abstract

Unsupervised skill discovery seeks to acquire different useful skills without extrinsic reward via unsupervised reinforcement learning (RL), with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced skill discovery methods struggle to well balance state exploration and skill diversity, particularly when the potential skills are rich and hard to discern. In this article, we propose contrastive dynamic skill discovery (ComSD) which generates diverse and exploratory unsupervised skills through a novel intrinsic incentive, named contrastive dynamic reward. It contains a particle-based exploration reward to make agents access far-reaching states for exploratory skill acquisition, and a novel contrastive diversity reward to promote the discriminability between different skills. Moreover, a novel dynamic weighting mechanism between the above two rewards is proposed to balance state exploration and skill diversity, which further enhances the quality of the discovered skills. Extensive experiments and analysis demonstrate that ComSD can generate diverse behaviors at different exploratory levels for multijoint robots, enabling state-of-the-art adaptation performance on challenging downstream tasks. It can also discover distinguishable and far-reaching exploration skills in the challenging tree-like 2-D maze.

摘要

无监督技能发现旨在通过无监督强化学习(RL)在没有外部奖励的情况下获取不同的有用技能,所发现的技能能够以各种方式有效地适应多个下游任务。然而,最近的先进技能发现方法难以很好地平衡状态探索和技能多样性,特别是当潜在技能丰富且难以辨别时。在本文中,我们提出了对比动态技能发现(ComSD),它通过一种名为对比动态奖励的新型内在激励来生成多样化和探索性的无监督技能。它包含一个基于粒子的探索奖励,以使智能体访问深远状态以获取探索性技能,以及一个新型的对比多样性奖励,以促进不同技能之间的可辨别性。此外,还提出了上述两种奖励之间的新型动态加权机制,以平衡状态探索和技能多样性,这进一步提高了所发现技能的质量。广泛的实验和分析表明,ComSD可以为多关节机器人在不同探索水平上生成多样化行为,在具有挑战性的下游任务上实现了领先的适应性能。它还可以在具有挑战性的树状二维迷宫中发现可区分且深远的探索技能。

相似文献

1
Balancing State Exploration and Skill Diversity in Unsupervised Skill Discovery.无监督技能发现中的平衡状态探索与技能多样性
IEEE Trans Cybern. 2025 May;55(5):2234-2247. doi: 10.1109/TCYB.2025.3548821. Epub 2025 Apr 23.
2
CoSD: Balancing behavioral consistency and diversity in unsupervised skill discovery.
Neural Netw. 2025 Feb;182:106889. doi: 10.1016/j.neunet.2024.106889. Epub 2024 Nov 12.
3
Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning.技能很重要:多智能体合作强化学习中的动态技能学习
Neural Netw. 2025 Jan;181:106852. doi: 10.1016/j.neunet.2024.106852. Epub 2024 Nov 2.
4
Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration-exploitation balance with bio-inspired neural networks.神经拟态强化学习探索:基于生物启发神经网络的探索-利用平衡计算方法。
Neural Netw. 2022 Jul;151:16-33. doi: 10.1016/j.neunet.2022.03.021. Epub 2022 Mar 23.
5
Exploratory State Representation Learning.探索性状态表示学习
Front Robot AI. 2022 Feb 14;9:762051. doi: 10.3389/frobt.2022.762051. eCollection 2022.
6
Task-Oriented Deep Reinforcement Learning for Robotic Skill Acquisition and Control.面向任务的机器人技能获取和控制的深度强化学习。
IEEE Trans Cybern. 2021 Feb;51(2):1056-1069. doi: 10.1109/TCYB.2019.2949596. Epub 2021 Jan 15.
7
Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning.基于像素的强化学习的对比预测模型的视觉预训练。
Sensors (Basel). 2022 Aug 29;22(17):6504. doi: 10.3390/s22176504.
8
Incremental learning of skill collections based on intrinsic motivation.基于内在动机的技能集合的增量学习。
Front Neurorobot. 2013 Jul 26;7:11. doi: 10.3389/fnbot.2013.00011. eCollection 2013.
9
Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning.深度强化学习中用于自监督探索的变分动力学
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4776-4790. doi: 10.1109/TNNLS.2021.3129160. Epub 2023 Aug 4.
10
Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle.探索的内在奖励,无需因观测噪声而产生危害:基于自由能原理的模拟研究。
Neural Comput. 2024 Aug 19;36(9):1854-1885. doi: 10.1162/neco_a_01690.