Suppr超能文献

用于自适应多路径路由优化的多智能体元强化学习

Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization.

作者信息

Chen Long, Hu Bin, Guan Zhi-Hong, Zhao Lian, Shen Xuemin

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5374-5386. doi: 10.1109/TNNLS.2021.3070584. Epub 2022 Oct 5.

Abstract

In this article, we investigate the routing problem of packet networks through multiagent reinforcement learning (RL), which is a very challenging topic in distributed and autonomous networked systems. In specific, the routing problem is modeled as a networked multiagent partially observable Markov decision process (MDP). Since the MDP of a network node is not only affected by its neighboring nodes' policies but also the network traffic demand, it becomes a multitask learning problem. Inspired by recent success of RL and metalearning, we propose two novel model-free multiagent RL algorithms, named multiagent proximal policy optimization (MAPPO) and multiagent metaproximal policy optimization (meta-MAPPO), to optimize the network performances under fixed and time-varying traffic demand, respectively. A practicable distributed implementation framework is designed based on the separability of exploration and exploitation in training MAPPO. Compared with the existing routing optimization policies, our simulation results demonstrate the excellent performances of the proposed algorithms.

摘要

在本文中,我们通过多智能体强化学习(RL)研究分组网络的路由问题,这在分布式和自治网络系统中是一个极具挑战性的课题。具体而言,路由问题被建模为一个网络化多智能体部分可观测马尔可夫决策过程(MDP)。由于网络节点的MDP不仅受其相邻节点策略的影响,还受网络流量需求的影响,这就变成了一个多任务学习问题。受近期强化学习和元学习成功的启发,我们提出了两种新颖的无模型多智能体强化学习算法,分别称为多智能体近端策略优化(MAPPO)和多智能体元近端策略优化(meta - MAPPO),以分别在固定和时变流量需求下优化网络性能。基于MAPPO训练中探索与利用的可分离性,设计了一个可行的分布式实现框架。与现有的路由优化策略相比,我们的仿真结果证明了所提算法的优异性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验