Suppr超能文献

用于合作多智能体强化学习的鲁棒无奖励行动者-评论家算法

Robust Reward-Free Actor-Critic for Cooperative Multiagent Reinforcement Learning.

作者信息

Lin Qifeng, Ling Qing

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17318-17329. doi: 10.1109/TNNLS.2023.3302131. Epub 2024 Dec 2.

Abstract

In this article, we consider centralized training and decentralized execution (CTDE) with diverse and private reward functions in cooperative multiagent reinforcement learning (MARL). The main challenge is that an unknown number of agents, whose identities are also unknown, can deliberately generate malicious messages and transmit them to the central controller. We term these malicious actions as Byzantine attacks. First, without Byzantine attacks, we propose a reward-free deep deterministic policy gradient (RF-DDPG) algorithm, in which gradients of agents' critics rather than rewards are sent to the central controller for preserving privacy. Second, to cope with Byzantine attacks, we develop a robust extension of RF-DDPG termed R2F-DDPG, which replaces the vulnerable average aggregation rule with robust ones. We propose a novel class of RL-specific Byzantine attacks that fail conventional robust aggregation rules, motivating the projection-boosted robust aggregation rules for R2F-DDPG. Numerical experiments show that RF-DDPG successfully trains agents to work cooperatively and that R2F-DDPG demonstrates robustness to Byzantine attacks.

摘要

在本文中,我们考虑合作多智能体强化学习(MARL)中具有多样化和私有奖励函数的集中训练与分散执行(CTDE)。主要挑战在于,数量未知且身份也未知的智能体可能会故意生成恶意消息并将其传输给中央控制器。我们将这些恶意行为称为拜占庭攻击。首先,在没有拜占庭攻击的情况下,我们提出了一种无奖励深度确定性策略梯度(RF-DDPG)算法,其中智能体评论家的梯度而非奖励被发送到中央控制器以保护隐私。其次,为应对拜占庭攻击,我们开发了RF-DDPG的一种鲁棒扩展算法,称为R2F-DDPG,它用鲁棒的聚合规则取代了易受攻击的平均聚合规则。我们提出了一类新颖的特定于强化学习的拜占庭攻击,这类攻击会使传统的鲁棒聚合规则失效,从而促使为R2F-DDPG设计投影增强鲁棒聚合规则。数值实验表明,RF-DDPG成功地训练智能体进行合作,并且R2F-DDPG对拜占庭攻击具有鲁棒性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验