• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于互信息正则化的鲁棒多智能体强化学习

Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization.

作者信息

Li Simin, Xu Ruixiao, Xiu Jingqiao, Zheng Yuwei, Feng Pu, Ma Yuqing, An Bo, Yang Yaodong, Liu Xianglong

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Oct;36(10):18118-18132. doi: 10.1109/TNNLS.2025.3577259.

DOI:10.1109/TNNLS.2025.3577259
PMID:40633025
Abstract

In cooperative multi-agent reinforcement learning (MARL), ensuring robustness against cooperative agents making unpredictable or worst-case adversarial actions is crucial for real-world deployment. In multi-agent settings, each agent may be perturbed or unperturbed, leading to an exponential increase in potential threat scenarios as the number of agents grows. Existing robust MARL methods either enumerate, or approximate all possible threat scenarios, leading to intense computation and insufficient robustness. In contrast, humans develop robust behaviors by maintaining a general level of caution rather than preparing for every possible threat. Inspired by human decision making, we frame robust MARL as a control-as-inference problem, and optimize worst-case robustness across all threat scenarios implicitly optimized through off-policy evaluation. Specifically, we introduce mutual information regularization as robust regularization (MIR3), which maximizes a lower bound on robustness during routine training, serving as a kind of caution for MARL without adversarial inputs. Further insights show that MIR3 acts as an information bottleneck, preventing agents from over-reacting to others and aligning policies with robust action priors. In the presence of worst-case adversaries, our MIR3 significantly surpasses baseline methods in robustness and training efficiency, and maintaining cooperative performance in StarCraft II, quadrotor swarm control, and robot swarm control. When deploying the robot swarm control algorithm in the real world, our method also outperforms the best baseline by 14.29% in reward. See code and demo videos at https://github.com/DIG-Beihang/MIR3.

摘要

在合作多智能体强化学习(MARL)中,确保对做出不可预测或最坏情况对抗性行动的合作智能体具有鲁棒性,对于实际应用至关重要。在多智能体环境中,每个智能体可能受到干扰或未受干扰,随着智能体数量的增加,潜在威胁场景呈指数级增长。现有的鲁棒MARL方法要么枚举,要么近似所有可能的威胁场景,导致计算量巨大且鲁棒性不足。相比之下,人类通过保持一般的谨慎程度来发展鲁棒行为,而不是为每一种可能的威胁做准备。受人类决策启发,我们将鲁棒MARL构建为一个控制即推理问题,并通过离策略评估隐式地优化所有威胁场景下的最坏情况鲁棒性。具体来说,我们引入互信息正则化作为鲁棒正则化(MIR3),它在常规训练期间最大化鲁棒性的下限,为没有对抗性输入的MARL提供一种谨慎。进一步的分析表明,MIR3起到了信息瓶颈的作用,防止智能体对其他智能体过度反应,并使策略与鲁棒行动先验保持一致。在存在最坏情况对手的情况下,我们的MIR3在鲁棒性和训练效率方面显著超越基线方法,并在《星际争霸II》、四旋翼无人机群控制和机器人集群控制中保持合作性能。在将机器人集群控制算法部署到现实世界中时,我们的方法在奖励方面也比最佳基线高出14.29%。请访问https://github.com/DIG-Beihang/MIR3查看代码和演示视频。

相似文献

1
Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization.基于互信息正则化的鲁棒多智能体强化学习
IEEE Trans Neural Netw Learn Syst. 2025 Oct;36(10):18118-18132. doi: 10.1109/TNNLS.2025.3577259.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Vesicoureteral Reflux膀胱输尿管反流
4
Shoulder Arthrogram肩关节造影
5
Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.通过对抗性少数群体影响攻击合作多智能体强化学习。
Neural Netw. 2025 Nov;191:107747. doi: 10.1016/j.neunet.2025.107747. Epub 2025 Jun 21.
6
Mid Forehead Brow Lift额中眉提升术
7
Representation-driven sampling and adaptive policy resetting for improving multi-Agent reinforcement learning.用于改进多智能体强化学习的表征驱动采样与自适应策略重置
Neural Netw. 2025 Jul 15;192:107875. doi: 10.1016/j.neunet.2025.107875.
8
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
9
Sexual Harassment and Prevention Training性骚扰与预防培训
10
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.