• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TVDO:用于多智能体强化学习的切比雪夫值分解优化

TVDO: Tchebycheff Value-Decomposition Optimization for Multiagent Reinforcement Learning.

作者信息

Hu Xiaoliang, Guo Pengcheng, Li Yadong, Li Guangyu, Cui Zhen, Yang Jian

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12521-12534. doi: 10.1109/TNNLS.2024.3455422.

DOI:10.1109/TNNLS.2024.3455422
PMID:39302794
Abstract

In cooperative multiagent reinforcement learning (MARL), centralized training with decentralized execution (CTDE) has recently attracted more attention due to the physical demand. However, the most dilemma therein is the inconsistency between jointly-trained policies and individually executed actions. In this article, we propose a factorized Tchebycheff value-decomposition optimization (TVDO) method to overcome the trouble of inconsistency. In particular, a nonlinear Tchebycheff aggregation function is formulated to realize the global optimum by tightly constraining the upper bound of individual action-value bias, which is inspired by the Tchebycheff method of multiobjective optimization (MOO). We theoretically prove that, under no extra limitations, the factorized value decomposition with Tchebycheff aggregation satisfies the sufficiency and necessity of individual-global-max (IGM), which guarantees the consistency between the global and individual optimal action-value function. Empirically, in the climb and penalty game, we verify that TVDO precisely expresses the global-to-individual value decomposition with a guarantee of policy consistency. Meanwhile, we evaluate TVDO in the StarCraft multiagent challenge (SMAC) benchmark, and extensive experiments demonstrate that TVDO achieves a significant performance superiority over some SOTA MARL baselines.

摘要

在协作多智能体强化学习(MARL)中,集中训练与分散执行(CTDE)由于实际需求最近受到了更多关注。然而,其中最棘手的问题是联合训练的策略与单独执行的动作之间的不一致性。在本文中,我们提出了一种因式分解的切比雪夫值分解优化(TVDO)方法来克服不一致性问题。具体而言,我们构建了一个非线性切比雪夫聚合函数,通过严格限制个体动作值偏差的上限来实现全局最优,这一灵感来源于多目标优化(MOO)中的切比雪夫方法。我们从理论上证明,在没有额外限制的情况下,采用切比雪夫聚合的因式分解值分解满足个体 - 全局最大值(IGM)的充分性和必要性,这保证了全局和个体最优动作值函数之间的一致性。从实验上看,在攀爬和惩罚游戏中,我们验证了TVDO精确地表达了全局到个体的值分解,并保证了策略的一致性。同时,我们在星际争霸多智能体挑战赛(SMAC)基准测试中评估了TVDO,大量实验表明TVDO相对于一些最先进的MARL基线取得了显著的性能优势。

相似文献

1
TVDO: Tchebycheff Value-Decomposition Optimization for Multiagent Reinforcement Learning.TVDO:用于多智能体强化学习的切比雪夫值分解优化
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12521-12534. doi: 10.1109/TNNLS.2024.3455422.
2
Counterfactual value decomposition for cooperative multi-agent reinforcement learning.合作多智能体强化学习的反事实值分解
Neural Netw. 2025 Oct;190:107692. doi: 10.1016/j.neunet.2025.107692. Epub 2025 Jun 16.
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
Pharmacological treatments in panic disorder in adults: a network meta-analysis.成人惊恐障碍的药物治疗:网络荟萃分析。
Cochrane Database Syst Rev. 2023 Nov 28;11(11):CD012729. doi: 10.1002/14651858.CD012729.pub3.
5
Multidisciplinary biopsychosocial rehabilitation for subacute low back pain.亚急性下腰痛的多学科生物心理社会康复
Cochrane Database Syst Rev. 2017 Jun 28;6(6):CD002193. doi: 10.1002/14651858.CD002193.pub2.
6
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
7
Interventions for central serous chorioretinopathy: a network meta-analysis.中心性浆液性脉络膜视网膜病变的干预措施:一项网状Meta分析
Cochrane Database Syst Rev. 2025 Jun 16;6(6):CD011841. doi: 10.1002/14651858.CD011841.pub3.
8
Celecoxib for osteoarthritis.塞来昔布用于骨关节炎
Cochrane Database Syst Rev. 2017 May 22;5(5):CD009865. doi: 10.1002/14651858.CD009865.pub2.
9
Electronic cigarettes for smoking cessation.用于戒烟的电子烟。
Cochrane Database Syst Rev. 2025 Jan 29;1(1):CD010216. doi: 10.1002/14651858.CD010216.pub9.
10
Strategies to improve smoking cessation rates in primary care.提高初级保健中戒烟率的策略。
Cochrane Database Syst Rev. 2021 Sep 6;9(9):CD011556. doi: 10.1002/14651858.CD011556.pub2.