Suppr超能文献

连续空间中网络聚合马尔可夫博弈的期望策略梯度

Expected Policy Gradient for Network Aggregative Markov Games in Continuous Space.

作者信息

Moghaddam Alireza Ramezani, Kebriaei Hamed

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7372-7381. doi: 10.1109/TNNLS.2024.3387871. Epub 2025 Apr 4.

Abstract

In this article, we investigate the Nash-seeking problem of a set of agents, playing an infinite network aggregative Markov game. In particular, we focus on a noncooperative framework where each agent selfishly aims at maximizing its long-term average reward without having explicit information on the model of the environment dynamics and its own reward function. The main contribution of this article is to develop a continuous multiagent reinforcement learning (MARL) algorithm for the Nash-seeking problem in infinite dynamic games with convergence guarantee. To this end, we propose an actor-critic MARL algorithm based on expected policy gradient (EPG) with two general function approximators to estimate the value function and the Nash policy of the agents. We consider continuous state and action spaces and adopt a newly proposed EPG to alleviate the variance of the gradient approximation. Based on such formulation and under some conventional assumptions (e.g., using linear function approximators), we prove that the policies of the agents converge to the unique Nash equilibrium (NE) of the game. Furthermore, an estimation error analysis is conducted to investigate the effects of the error arising from function approximation. As a case study, the framework is applied on a cloud radio access network (C-RAN) by modeling the remote radio heads (RRHs) as the agents and the congestion of baseband units (BBUs) as the dynamics of the environment.

摘要

在本文中,我们研究了一组智能体在无限网络聚合马尔可夫博弈中的纳什寻优问题。具体而言,我们关注一个非合作框架,其中每个智能体自私地旨在最大化其长期平均奖励,而无需关于环境动态模型及其自身奖励函数的明确信息。本文的主要贡献在于为具有收敛保证的无限动态博弈中的纳什寻优问题开发一种连续多智能体强化学习(MARL)算法。为此,我们提出一种基于期望策略梯度(EPG)的演员 - 评论家MARL算法,使用两个通用函数逼近器来估计智能体的值函数和纳什策略。我们考虑连续状态和动作空间,并采用新提出的EPG来减轻梯度逼近的方差。基于这样的公式化并在一些传统假设(例如,使用线性函数逼近器)下,我们证明智能体的策略收敛到博弈的唯一纳什均衡(NE)。此外,进行了估计误差分析以研究函数逼近产生的误差的影响。作为一个案例研究,通过将远程无线电头(RRH)建模为智能体,将基带单元(BBU)的拥塞建模为环境动态,将该框架应用于云无线接入网络(C-RAN)。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验