• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

技能很重要:多智能体合作强化学习中的动态技能学习

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning.

作者信息

Li Tong, Bai Chenjia, Xu Kang, Chu Chen, Zhu Peican, Wang Zhen

机构信息

School of Cybersecurity, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China.

Institute of Artificial Intelligence (TeleAI), China Telecom, 200232, Shanghai, China.

出版信息

Neural Netw. 2025 Jan;181:106852. doi: 10.1016/j.neunet.2024.106852. Epub 2024 Nov 2.

DOI:10.1016/j.neunet.2024.106852
PMID:39522419
Abstract

With the popularization of intelligence, the necessity of cooperation between intelligent machines makes the research of collaborative multi-agent reinforcement learning (MARL) more extensive. Existing approaches typically address this challenge through task decomposition of the environment or role classification of agents. However, these studies may rely on the sharing of parameters between agents, resulting in the homogeneity of agent behavior, which is not effective for complex tasks. Or training that relies on external rewards is difficult to adapt to scenarios with sparse rewards. Based on the above challenges, in this paper we propose a novel dynamic skill learning (DSL) framework for agents to learn more diverse abilities motivated by internal rewards. Specifically, the DSL has two components: (i) Dynamic skill discovery, which encourages the production of meaningful skills by exploring the environment in an unsupervised manner, using the inner product between a skill vector and a trajectory representation to generate intrinsic rewards. Meanwhile, the Lipschitz constraint of the state representation function is used to ensure the proper trajectory of the learned skills. (ii) Dynamic skill assignment, which utilizes a policy controller to assign skills to each agent based on its different trajectory latent variables. In addition, in order to avoid training instability caused by frequent changes in skill selection, we introduce a regularization term to limit skill switching between adjacent time steps. We thoroughly tested the DSL approach on two challenging benchmarks, StarCraft II and Google Research Football. Experimental results show that compared with strong benchmarks such as QMIX and RODE, DSL effectively improves performance and is more adaptable to difficult collaborative scenarios.

摘要

随着智能化的普及,智能机器之间合作的必要性使得协作多智能体强化学习(MARL)的研究更加广泛。现有方法通常通过环境的任务分解或智能体的角色分类来应对这一挑战。然而,这些研究可能依赖于智能体之间的参数共享,导致智能体行为的同质性,这对于复杂任务并不有效。或者依赖外部奖励的训练难以适应奖励稀疏的场景。基于上述挑战,在本文中,我们提出了一种新颖的动态技能学习(DSL)框架,使智能体能够基于内部奖励学习更多样化的能力。具体而言,DSL有两个组件:(i)动态技能发现,通过无监督方式探索环境来鼓励产生有意义的技能,利用技能向量与轨迹表示之间的内积来生成内在奖励。同时,使用状态表示函数的Lipschitz约束来确保所学技能的适当轨迹。(ii)动态技能分配,利用策略控制器根据每个智能体不同的轨迹潜在变量为其分配技能。此外,为了避免因技能选择频繁变化导致的训练不稳定,我们引入一个正则化项来限制相邻时间步之间的技能切换。我们在两个具有挑战性的基准测试,即《星际争霸II》和谷歌研究足球上对DSL方法进行了全面测试。实验结果表明,与QMIX和RODE等强大基准相比,DSL有效提高了性能,并且更能适应困难的协作场景。

相似文献

1
Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning.技能很重要:多智能体合作强化学习中的动态技能学习
Neural Netw. 2025 Jan;181:106852. doi: 10.1016/j.neunet.2024.106852. Epub 2024 Nov 2.
2
Hierarchical task network-enhanced multi-agent reinforcement learning: Toward efficient cooperative strategies.分层任务网络增强的多智能体强化学习:迈向高效协作策略
Neural Netw. 2025 Jun;186:107254. doi: 10.1016/j.neunet.2025.107254. Epub 2025 Feb 11.
3
MuDE: Multi-agent decomposed reward-based exploration.MuDE:基于多代理分解奖励的探索。
Neural Netw. 2024 Nov;179:106565. doi: 10.1016/j.neunet.2024.106565. Epub 2024 Jul 22.
4
Generative subgoal oriented multi-agent reinforcement learning through potential field.基于势场的面向生成子目标的多智能体强化学习。
Neural Netw. 2024 Nov;179:106552. doi: 10.1016/j.neunet.2024.106552. Epub 2024 Jul 17.
5
CoSD: Balancing behavioral consistency and diversity in unsupervised skill discovery.
Neural Netw. 2025 Feb;182:106889. doi: 10.1016/j.neunet.2024.106889. Epub 2024 Nov 12.
6
Credit assignment with predictive contribution measurement in multi-agent reinforcement learning.多智能体强化学习中的信用分配与预测贡献度量。
Neural Netw. 2023 Jul;164:681-690. doi: 10.1016/j.neunet.2023.05.021. Epub 2023 May 20.
7
TIMAR: Transition-informed representation for sample-efficient multi-agent reinforcement learning.TIMAR:用于样本高效多智能体强化学习的转换感知表示
Neural Netw. 2025 Apr;184:107081. doi: 10.1016/j.neunet.2024.107081. Epub 2024 Dec 31.
8
Constraining an Unconstrained Multi-agent Policy with offline data.使用离线数据约束无约束多智能体策略。
Neural Netw. 2025 Jun;186:107253. doi: 10.1016/j.neunet.2025.107253. Epub 2025 Feb 13.
9
LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.LJIR:在合作多智能体强化学习中学习联合行动内在奖励
Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.
10
GeneWorker: An end-to-end robotic reinforcement learning approach with collaborative generator and worker networks.基因工作者:一种具有协作生成器和工作网络的端到端机器人强化学习方法。
Neural Netw. 2024 Oct;178:106472. doi: 10.1016/j.neunet.2024.106472. Epub 2024 Jun 18.

引用本文的文献

1
Incentivising cooperation by judging a group's performance by its weakest member in neuroevolution and reinforcement learning.在神经进化和强化学习中,通过以群体中最薄弱的成员来评判群体表现来激励合作。
Front Robot AI. 2025 Jul 25;12:1599676. doi: 10.3389/frobt.2025.1599676. eCollection 2025.