Suppr超能文献

技能很重要:多智能体合作强化学习中的动态技能学习

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning.

作者信息

Li Tong, Bai Chenjia, Xu Kang, Chu Chen, Zhu Peican, Wang Zhen

机构信息

School of Cybersecurity, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China.

Institute of Artificial Intelligence (TeleAI), China Telecom, 200232, Shanghai, China.

出版信息

Neural Netw. 2025 Jan;181:106852. doi: 10.1016/j.neunet.2024.106852. Epub 2024 Nov 2.

Abstract

With the popularization of intelligence, the necessity of cooperation between intelligent machines makes the research of collaborative multi-agent reinforcement learning (MARL) more extensive. Existing approaches typically address this challenge through task decomposition of the environment or role classification of agents. However, these studies may rely on the sharing of parameters between agents, resulting in the homogeneity of agent behavior, which is not effective for complex tasks. Or training that relies on external rewards is difficult to adapt to scenarios with sparse rewards. Based on the above challenges, in this paper we propose a novel dynamic skill learning (DSL) framework for agents to learn more diverse abilities motivated by internal rewards. Specifically, the DSL has two components: (i) Dynamic skill discovery, which encourages the production of meaningful skills by exploring the environment in an unsupervised manner, using the inner product between a skill vector and a trajectory representation to generate intrinsic rewards. Meanwhile, the Lipschitz constraint of the state representation function is used to ensure the proper trajectory of the learned skills. (ii) Dynamic skill assignment, which utilizes a policy controller to assign skills to each agent based on its different trajectory latent variables. In addition, in order to avoid training instability caused by frequent changes in skill selection, we introduce a regularization term to limit skill switching between adjacent time steps. We thoroughly tested the DSL approach on two challenging benchmarks, StarCraft II and Google Research Football. Experimental results show that compared with strong benchmarks such as QMIX and RODE, DSL effectively improves performance and is more adaptable to difficult collaborative scenarios.

摘要

随着智能化的普及,智能机器之间合作的必要性使得协作多智能体强化学习(MARL)的研究更加广泛。现有方法通常通过环境的任务分解或智能体的角色分类来应对这一挑战。然而,这些研究可能依赖于智能体之间的参数共享,导致智能体行为的同质性,这对于复杂任务并不有效。或者依赖外部奖励的训练难以适应奖励稀疏的场景。基于上述挑战,在本文中,我们提出了一种新颖的动态技能学习(DSL)框架,使智能体能够基于内部奖励学习更多样化的能力。具体而言,DSL有两个组件:(i)动态技能发现,通过无监督方式探索环境来鼓励产生有意义的技能,利用技能向量与轨迹表示之间的内积来生成内在奖励。同时,使用状态表示函数的Lipschitz约束来确保所学技能的适当轨迹。(ii)动态技能分配,利用策略控制器根据每个智能体不同的轨迹潜在变量为其分配技能。此外,为了避免因技能选择频繁变化导致的训练不稳定,我们引入一个正则化项来限制相邻时间步之间的技能切换。我们在两个具有挑战性的基准测试,即《星际争霸II》和谷歌研究足球上对DSL方法进行了全面测试。实验结果表明,与QMIX和RODE等强大基准相比,DSL有效提高了性能,并且更能适应困难的协作场景。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验