• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于自适应多路径路由优化的多智能体元强化学习

Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization.

作者信息

Chen Long, Hu Bin, Guan Zhi-Hong, Zhao Lian, Shen Xuemin

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5374-5386. doi: 10.1109/TNNLS.2021.3070584. Epub 2022 Oct 5.

DOI:10.1109/TNNLS.2021.3070584
PMID:33881997
Abstract

In this article, we investigate the routing problem of packet networks through multiagent reinforcement learning (RL), which is a very challenging topic in distributed and autonomous networked systems. In specific, the routing problem is modeled as a networked multiagent partially observable Markov decision process (MDP). Since the MDP of a network node is not only affected by its neighboring nodes' policies but also the network traffic demand, it becomes a multitask learning problem. Inspired by recent success of RL and metalearning, we propose two novel model-free multiagent RL algorithms, named multiagent proximal policy optimization (MAPPO) and multiagent metaproximal policy optimization (meta-MAPPO), to optimize the network performances under fixed and time-varying traffic demand, respectively. A practicable distributed implementation framework is designed based on the separability of exploration and exploitation in training MAPPO. Compared with the existing routing optimization policies, our simulation results demonstrate the excellent performances of the proposed algorithms.

摘要

在本文中,我们通过多智能体强化学习(RL)研究分组网络的路由问题,这在分布式和自治网络系统中是一个极具挑战性的课题。具体而言,路由问题被建模为一个网络化多智能体部分可观测马尔可夫决策过程(MDP)。由于网络节点的MDP不仅受其相邻节点策略的影响,还受网络流量需求的影响,这就变成了一个多任务学习问题。受近期强化学习和元学习成功的启发,我们提出了两种新颖的无模型多智能体强化学习算法,分别称为多智能体近端策略优化(MAPPO)和多智能体元近端策略优化(meta - MAPPO),以分别在固定和时变流量需求下优化网络性能。基于MAPPO训练中探索与利用的可分离性,设计了一个可行的分布式实现框架。与现有的路由优化策略相比,我们的仿真结果证明了所提算法的优异性能。

相似文献

1
Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization.用于自适应多路径路由优化的多智能体元强化学习
IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5374-5386. doi: 10.1109/TNNLS.2021.3070584. Epub 2022 Oct 5.
2
NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning.NVIF:用于协作式大规模多智能体强化学习的相邻变分信息流
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17829-17841. doi: 10.1109/TNNLS.2023.3309608. Epub 2024 Dec 2.
3
Fast-Convergence Reinforcement Learning for Routing in LEO Satellite Networks.低轨卫星网络中的快速收敛强化学习路由。
Sensors (Basel). 2023 May 29;23(11):5180. doi: 10.3390/s23115180.
4
MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning.MOO-MDP:面向协同多智能体强化学习的面向对象表示。
IEEE Trans Cybern. 2019 Feb;49(2):567-579. doi: 10.1109/TCYB.2017.2781130. Epub 2017 Dec 28.
5
Multiagent Trust Region Policy Optimization.多智能体信赖域策略优化
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12873-12887. doi: 10.1109/TNNLS.2023.3265358. Epub 2024 Sep 3.
6
Reinforcement Learning With Task Decomposition for Cooperative Multiagent Systems.用于协作多智能体系统的基于任务分解的强化学习
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2054-2065. doi: 10.1109/TNNLS.2020.2996209. Epub 2021 May 3.
7
Attention-Based Meta-Reinforcement Learning for Tracking Control of AUV With Time-Varying Dynamics.基于注意力的元强化学习用于时变动力学自主水下航行器的跟踪控制
IEEE Trans Neural Netw Learn Syst. 2022 Nov;33(11):6388-6401. doi: 10.1109/TNNLS.2021.3079148. Epub 2022 Oct 27.
8
Multi-Agent Reinforcement Learning for Traffic Flow Management of Autonomous Vehicles.多智能体强化学习在自动驾驶车辆交通流管理中的应用。
Sensors (Basel). 2023 Feb 21;23(5):2373. doi: 10.3390/s23052373.
9
Enhanced Routing Algorithm Based on Reinforcement Machine Learning-A Case of VoIP Service.基于强化机器学习的增强路由算法——以VoIP服务为例
Sensors (Basel). 2021 Jan 12;21(2):504. doi: 10.3390/s21020504.
10
Reinforcement learning for routing in cognitive radio ad hoc networks.认知无线电自组织网络中用于路由的强化学习
ScientificWorldJournal. 2014;2014:960584. doi: 10.1155/2014/960584. Epub 2014 Jul 16.

引用本文的文献

1
MW-MADDPG: a meta-learning based decision-making method for collaborative UAV swarm.MW-MADDPG:一种基于元学习的协作无人机群决策方法。
Front Neurorobot. 2023 Sep 21;17:1243174. doi: 10.3389/fnbot.2023.1243174. eCollection 2023.