Suppr超能文献

具有交互和任务表示的多任务多智能体强化学习

Multi-Task Multi-Agent Reinforcement Learning With Interaction and Task Representations.

作者信息

Li Chao, Dong Shaokang, Yang Shangdong, Hu Yujing, Ding Tianyu, Li Wenbin, Gao Yang

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):13431-13445. doi: 10.1109/TNNLS.2024.3475216.

Abstract

Multi-task multi-agent reinforcement learning (MT-MARL) is capable of leveraging useful knowledge across multiple related tasks to improve performance on any single task. While recent studies have tentatively achieved this by learning independent policies on a shared representation space, we pinpoint that further advancements can be realized by explicitly characterizing agent interactions within these multi-agent tasks and identifying task relations for selective reuse. To this end, this article proposes Representing Interactions and Tasks (RIT), a novel MT-MARL algorithm that characterizes both intra-task agent interactions and inter-task task relations. Specifically, for characterizing agent interactions, RIT presents the interactive value decomposition to explicitly take the dependency among agents into policy learning. Theoretical analysis demonstrates that the learned utility value of each agent approximates its Shapley value, thus representing agent interactions. Moreover, we learn task representations based on per-agent local trajectories, which assess task similarities and accordingly identify task relations. As a result, RIT facilitates the effective transfer of interaction knowledge across similar multi-agent tasks. Structurally, RIT develops universal policy structure for scalable multi-task policy learning. We evaluate RIT against multiple state-of-the-art baselines in various cooperative tasks, and its significant performance under both multi-task and zero-shot settings demonstrates its effectiveness.

摘要

多任务多智能体强化学习(MT-MARL)能够利用多个相关任务中的有用知识来提高任何单个任务的性能。虽然最近的研究通过在共享表征空间上学习独立策略初步实现了这一点,但我们指出,通过明确表征这些多智能体任务中的智能体交互并识别任务关系以进行选择性重用,可以实现进一步的进展。为此,本文提出了交互与任务表征(RIT),这是一种新颖的MT-MARL算法,它同时表征任务内智能体交互和任务间任务关系。具体而言,为了表征智能体交互,RIT提出了交互价值分解,以在策略学习中明确考虑智能体之间的依赖性。理论分析表明,每个智能体学习到的效用值近似其夏普值,从而表征智能体交互。此外,我们基于每个智能体的局部轨迹学习任务表征,评估任务相似性并据此识别任务关系。结果,RIT促进了交互知识在相似多智能体任务之间的有效转移。在结构上,RIT开发了用于可扩展多任务策略学习的通用策略结构。我们在各种合作任务中针对多个最新基线评估了RIT,其在多任务和零样本设置下的显著性能证明了其有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验