• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于最佳响应策略的分布式多智能体强化学习

Decentralized multi-agent reinforcement learning based on best-response policies.

作者信息

Gabler Volker, Wollherr Dirk

机构信息

Chair of Automatic Control Engineering, TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.

出版信息

Front Robot AI. 2024 Apr 16;11:1229026. doi: 10.3389/frobt.2024.1229026. eCollection 2024.

DOI:10.3389/frobt.2024.1229026
PMID:38690119
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11059992/
Abstract

Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems. In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor-critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training. We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon.

摘要

多智能体系统是一个跨学科研究领域,它描述了多个具有决策能力的个体与通常部分可观察环境进行交互的概念。鉴于单智能体强化学习的最新进展,多智能体强化学习(RL)近年来引起了极大的关注。大多数研究采用完全集中式学习方案,以简化从单智能体领域到多智能体系统的转换。相比之下,我们认为分散式学习方案更适合实际场景中的应用,因为这允许在单个机器人上部署学习算法,而不是将算法部署到整个机器人机队。因此,本文概述了一种专门针对稀疏奖励领域中的合作多智能体强化学习(MARL)问题的新型演员-评论家(AC)方法。我们的方法将MARL问题解耦为一组分布式智能体,这些智能体将其他智能体建模为响应实体。特别是,我们建议每个智能体使用两个单独的评论家,以区分联合任务奖励和多机器人规划中常用的基于智能体的成本。一方面,基于智能体的评论家旨在降低特定于智能体的成本。另一方面,每个智能体旨在根据联合任务评论家优化联合团队奖励。由于这个评论家仍然依赖于所有智能体的联合行动,我们概述了基于斯塔克尔伯格博弈的两种合适的行为模型:与自然的博弈和与每个智能体的二元博弈。遵循这些行为模型,我们的算法允许完全分散式执行和训练。我们在稀疏奖励的模拟多智能体环境中使用所提出的行为模型评估我们提出的方法。尽管我们的方法已经优于当前的先进学习者,但我们在本文结尾概述了我们算法可能的扩展,未来的研究可以在此基础上进行。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9277/11059992/ac81a37786c5/frobt-11-1229026-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9277/11059992/46f58e034ced/frobt-11-1229026-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9277/11059992/aba298abcce7/frobt-11-1229026-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9277/11059992/ac81a37786c5/frobt-11-1229026-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9277/11059992/46f58e034ced/frobt-11-1229026-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9277/11059992/aba298abcce7/frobt-11-1229026-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9277/11059992/ac81a37786c5/frobt-11-1229026-g003.jpg

相似文献

1
Decentralized multi-agent reinforcement learning based on best-response policies.基于最佳响应策略的分布式多智能体强化学习
Front Robot AI. 2024 Apr 16;11:1229026. doi: 10.3389/frobt.2024.1229026. eCollection 2024.
2
Coordination as inference in multi-agent reinforcement learning.多智能体强化学习中的协调作为推理。
Neural Netw. 2024 Apr;172:106101. doi: 10.1016/j.neunet.2024.106101. Epub 2024 Jan 11.
3
Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning.通过演员-评论家强化学习实现多人扑克的最优策略
Entropy (Basel). 2022 May 30;24(6):774. doi: 10.3390/e24060774.
4
Strangeness-driven exploration in multi-agent reinforcement learning.多智能体强化学习中的奇异驱动探索。
Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.
5
IHG-MA: Inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control.IHG-MA:用于多交叉口交通信号控制的归纳异质图多智能体强化学习。
Neural Netw. 2021 Jul;139:265-277. doi: 10.1016/j.neunet.2021.03.015. Epub 2021 Mar 22.
6
Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space.具有时变复合动作空间的结构化协作强化学习
IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):8618-8634. doi: 10.1109/TPAMI.2021.3102140. Epub 2022 Oct 4.
7
Credit assignment with predictive contribution measurement in multi-agent reinforcement learning.多智能体强化学习中的信用分配与预测贡献度量。
Neural Netw. 2023 Jul;164:681-690. doi: 10.1016/j.neunet.2023.05.021. Epub 2023 May 20.
8
Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey.多智能体深度强化学习在多机器人应用中的研究综述
Sensors (Basel). 2023 Mar 30;23(7):3625. doi: 10.3390/s23073625.
9
Optimistic sequential multi-agent reinforcement learning with motivational communication.带有激励性沟通的乐观序贯多智能体强化学习。
Neural Netw. 2024 Nov;179:106547. doi: 10.1016/j.neunet.2024.106547. Epub 2024 Jul 22.
10
Graph Soft Actor-Critic Reinforcement Learning for Large-Scale Distributed Multirobot Coordination.用于大规模分布式多机器人协调的图软演员-评论家强化学习
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):665-676. doi: 10.1109/TNNLS.2023.3329530. Epub 2025 Jan 7.

本文引用的文献

1
Multiagent Soft Actor-Critic Based Hybrid Motion Planner for Mobile Robots.基于多智能体软演员-评论家的移动机器人混合运动规划器
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10980-10992. doi: 10.1109/TNNLS.2022.3172168. Epub 2023 Nov 30.
2
Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications.用于多智能体系统的深度强化学习:挑战、解决方案及应用综述
IEEE Trans Cybern. 2020 Sep;50(9):3826-3839. doi: 10.1109/TCYB.2020.2977374. Epub 2020 Mar 20.
3
SciPy 1.0: fundamental algorithms for scientific computing in Python.
SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
4
Grandmaster level in StarCraft II using multi-agent reinforcement learning.星际争霸 II 中的大师级水平使用多智能体强化学习。
Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.
5
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
6
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
7
Stochastic Games.随机博弈
Proc Natl Acad Sci U S A. 1953 Oct;39(10):1095-100. doi: 10.1073/pnas.39.10.1095.
8
Equilibrium Points in N-Person Games.N人博弈中的平衡点
Proc Natl Acad Sci U S A. 1950 Jan;36(1):48-9. doi: 10.1073/pnas.36.1.48.