• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多智能体强化学习在合作任务中的知识重用

Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks.

作者信息

Shi Daming, Tong Junbo, Liu Yi, Fan Wenhui

机构信息

Department of Automation, Tsinghua University, Beijing 100084, China.

出版信息

Entropy (Basel). 2022 Mar 28;24(4):470. doi: 10.3390/e24040470.

DOI:10.3390/e24040470
PMID:35455134
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9025018/
Abstract

With the development and appliance of multi-agent systems, multi-agent cooperation is becoming an important problem in artificial intelligence. Multi-agent reinforcement learning (MARL) is one of the most effective methods for solving multi-agent cooperative tasks. However, the huge sample complexity of traditional reinforcement learning methods results in two kinds of training waste in MARL for cooperative tasks: all homogeneous agents are trained independently and repetitively, and multi-agent systems need training from scratch when adding a new teammate. To tackle these two problems, we propose the knowledge reuse methods of MARL. On the one hand, this paper proposes sharing experience and policy within agents to mitigate training waste. On the other hand, this paper proposes reusing the policies learned by original teams to avoid knowledge waste when adding a new agent. Experimentally, the Pursuit task demonstrates how sharing experience and policy can accelerate the training speed and enhance the performance simultaneously. Additionally, transferring the learned policies from the N-agent enables the (N+1)-agent team to immediately perform cooperative tasks successfully, and only a minor training resource can allow the multi-agents to reach optimal performance identical to that from scratch.

摘要

随着多智能体系统的发展与应用,多智能体合作正成为人工智能中的一个重要问题。多智能体强化学习(MARL)是解决多智能体合作任务最有效的方法之一。然而,传统强化学习方法巨大的样本复杂性在MARL合作任务中导致了两种训练浪费:所有同质智能体被独立且重复地训练,并且当添加新队友时多智能体系统需要从头开始训练。为了解决这两个问题,我们提出了MARL的知识重用方法。一方面,本文提出在智能体之间共享经验和策略以减轻训练浪费。另一方面,本文提出重用原始团队学习到的策略以避免添加新智能体时的知识浪费。实验表明,追踪任务展示了共享经验和策略如何能够同时加速训练速度并提升性能。此外,将从N个智能体学到的策略进行迁移能使(N + 1)智能体团队立即成功执行合作任务,并且只需少量训练资源就能让多智能体达到与从头开始训练相同的最优性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/d793e1d92bb1/entropy-24-00470-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/a8aed3b8b019/entropy-24-00470-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/98e9482782b6/entropy-24-00470-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/8ed23f9db503/entropy-24-00470-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/2872656fe641/entropy-24-00470-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/6b219edbf175/entropy-24-00470-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/d5be05c13f8c/entropy-24-00470-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/3c7db9c02bc2/entropy-24-00470-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/4607f1370d3c/entropy-24-00470-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/58b340247feb/entropy-24-00470-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/853c8dd9b525/entropy-24-00470-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/a49dac7873c9/entropy-24-00470-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/d0a79909d625/entropy-24-00470-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/d793e1d92bb1/entropy-24-00470-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/a8aed3b8b019/entropy-24-00470-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/98e9482782b6/entropy-24-00470-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/8ed23f9db503/entropy-24-00470-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/2872656fe641/entropy-24-00470-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/6b219edbf175/entropy-24-00470-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/d5be05c13f8c/entropy-24-00470-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/3c7db9c02bc2/entropy-24-00470-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/4607f1370d3c/entropy-24-00470-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/58b340247feb/entropy-24-00470-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/853c8dd9b525/entropy-24-00470-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/a49dac7873c9/entropy-24-00470-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/d0a79909d625/entropy-24-00470-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37c7/9025018/d793e1d92bb1/entropy-24-00470-g013.jpg

相似文献

1
Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks.多智能体强化学习在合作任务中的知识重用
Entropy (Basel). 2022 Mar 28;24(4):470. doi: 10.3390/e24040470.
2
KnowRU: Knowledge Reuse via Knowledge Distillation in Multi-Agent Reinforcement Learning.KnowRU:多智能体强化学习中通过知识蒸馏实现的知识复用
Entropy (Basel). 2021 Aug 13;23(8):1043. doi: 10.3390/e23081043.
3
An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control.一种用于合作连续控制的离策略多智能体随机策略梯度算法。
Neural Netw. 2024 Feb;170:610-621. doi: 10.1016/j.neunet.2023.11.046. Epub 2023 Nov 23.
4
Decentralized multi-agent reinforcement learning based on best-response policies.基于最佳响应策略的分布式多智能体强化学习
Front Robot AI. 2024 Apr 16;11:1229026. doi: 10.3389/frobt.2024.1229026. eCollection 2024.
5
Lateral Transfer Learning for Multiagent Reinforcement Learning.多智能体强化学习的横向迁移学习。
IEEE Trans Cybern. 2023 Mar;53(3):1699-1711. doi: 10.1109/TCYB.2021.3108237. Epub 2023 Feb 15.
6
HyperComm: Hypergraph-based communication in multi-agent reinforcement learning.超通讯:多智能体强化学习中的基于超图的通讯。
Neural Netw. 2024 Oct;178:106432. doi: 10.1016/j.neunet.2024.106432. Epub 2024 Jun 10.
7
LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.LJIR:在合作多智能体强化学习中学习联合行动内在奖励
Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.
8
Strangeness-driven exploration in multi-agent reinforcement learning.多智能体强化学习中的奇异驱动探索。
Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.
9
Coordination as inference in multi-agent reinforcement learning.多智能体强化学习中的协调作为推理。
Neural Netw. 2024 Apr;172:106101. doi: 10.1016/j.neunet.2024.106101. Epub 2024 Jan 11.
10
Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning.通过演员-评论家强化学习实现多人扑克的最优策略
Entropy (Basel). 2022 May 30;24(6):774. doi: 10.3390/e24060774.

本文引用的文献

1
Model Learning and Knowledge Sharing for Cooperative Multiagent Systems in Stochastic Environment.随机环境下合作多智能体系统的模型学习与知识共享
IEEE Trans Cybern. 2021 Dec;51(12):5717-5727. doi: 10.1109/TCYB.2019.2958912. Epub 2021 Dec 22.
2
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.