• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过渐进式任务情境化实现多智能体持续协调

Multiagent Continual Coordination via Progressive Task Contextualization.

作者信息

Yuan Lei, Li Lihe, Zhang Ziqian, Zhang Fuxiang, Guan Cong, Yu Yang

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6326-6340. doi: 10.1109/TNNLS.2024.3394513. Epub 2025 Apr 4.

DOI:10.1109/TNNLS.2024.3394513
PMID:38896515
Abstract

Cooperative multiagent reinforcement learning (MARL) has attracted significant attention and has the potential for many real-world applications. Previous arts mainly focus on facilitating the coordination ability from different aspects (e.g., nonstationarity and credit assignment) in single-task or multitask scenarios, ignoring the stream of tasks that appear in a continual manner. This ignorance makes the continual coordination an unexplored territory, neither in problem formulation nor efficient algorithms designed. Toward tackling the mentioned issue, this article proposes an approach, multiagent continual coordination via progressive task contextualization (MACPro). The key point lies in obtaining a factorized policy, using shared feature extraction layers but separated independent task heads, each specializing in a specific class of tasks. The task heads can be progressively expanded based on the learned task contextualization. Moreover, to cater to the popular centralized training with decentralized execution (CTDE) paradigm in MARL, each agent learns to predict and adopt the most relevant policy head based on local information in a decentralized manner. We show in multiple multiagent benchmarks that existing continual learning methods fail, while MACPro is able to achieve close-to-optimal performance. More results also disclose the effectiveness of MACPro from multiple aspects, such as high generalization ability.

摘要

合作多智能体强化学习(MARL)已引起广泛关注,并在许多实际应用中具有潜力。先前的方法主要集中在单任务或多任务场景中从不同方面(例如非平稳性和信用分配)促进协调能力,而忽略了以连续方式出现的任务流。这种忽视使得连续协调成为一个未被探索的领域,无论是在问题表述还是高效算法设计方面。为了解决上述问题,本文提出了一种方法,即通过渐进式任务情境化实现多智能体连续协调(MACPro)。关键在于获得一个分解策略,使用共享特征提取层但分离独立的任务头,每个任务头专门处理特定类别的任务。任务头可以根据学习到的任务情境化进行渐进式扩展。此外,为了适应MARL中流行的集中训练与分散执行(CTDE)范式,每个智能体学会以分散方式根据本地信息预测并采用最相关的策略头。我们在多个多智能体基准测试中表明,现有的持续学习方法失败了,而MACPro能够实现接近最优的性能。更多结果还从多个方面揭示了MACPro的有效性,例如高泛化能力。

相似文献

1
Multiagent Continual Coordination via Progressive Task Contextualization.通过渐进式任务情境化实现多智能体持续协调
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6326-6340. doi: 10.1109/TNNLS.2024.3394513. Epub 2025 Apr 4.
2
SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning.SMIX(λ):增强用于协作多智能体强化学习的集中式价值函数
IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):52-63. doi: 10.1109/TNNLS.2021.3089493. Epub 2023 Jan 5.
3
Residual Q-Networks for Value Function Factorizing in Multiagent Reinforcement Learning.用于多智能体强化学习中价值函数分解的残差Q网络
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):1534-1544. doi: 10.1109/TNNLS.2022.3183865. Epub 2024 Feb 5.
4
UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios.联合国排雷行动处:非成形合作场景下的多智能体强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):2093-2104. doi: 10.1109/TNNLS.2021.3105869. Epub 2023 Apr 4.
5
Lateral Transfer Learning for Multiagent Reinforcement Learning.多智能体强化学习的横向迁移学习。
IEEE Trans Cybern. 2023 Mar;53(3):1699-1711. doi: 10.1109/TCYB.2021.3108237. Epub 2023 Feb 15.
6
Coordination as inference in multi-agent reinforcement learning.多智能体强化学习中的协调作为推理。
Neural Netw. 2024 Apr;172:106101. doi: 10.1016/j.neunet.2024.106101. Epub 2024 Jan 11.
7
A Local Information Aggregation-Based Multiagent Reinforcement Learning for Robot Swarm Dynamic Task Allocation.
IEEE Trans Neural Netw Learn Syst. 2025 Jun;36(6):10437-10449. doi: 10.1109/TNNLS.2025.3558282.
8
TVDO: Tchebycheff Value-Decomposition Optimization for Multiagent Reinforcement Learning.TVDO:用于多智能体强化学习的切比雪夫值分解优化
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12521-12534. doi: 10.1109/TNNLS.2024.3455422.
9
Fully Decentralized Multiagent Communication via Causal Inference.
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10193-10202. doi: 10.1109/TNNLS.2022.3165114. Epub 2023 Nov 30.
10
Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning.
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9044-9056. doi: 10.1109/TNNLS.2024.3420791. Epub 2025 May 2.