• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于自适应卡尔曼时间差分和后继表示的多智能体强化学习

Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation.

作者信息

Salimibeni Mohammad, Mohammadi Arash, Malekzadeh Parvin, Plataniotis Konstantinos N

机构信息

Concordia Institute for Information System Engineering, Concordia University, Montreal, QC H3G 1M8, Canada.

Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada.

出版信息

Sensors (Basel). 2022 Feb 11;22(4):1393. doi: 10.3390/s22041393.

DOI:10.3390/s22041393
PMID:35214293
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8962978/
Abstract

Development of distributed Multi-Agent Reinforcement Learning (MARL) algorithms has attracted an increasing surge of interest lately. Generally speaking, conventional Model-Based (MB) or Model-Free (MF) RL algorithms are not directly applicable to the MARL problems due to utilization of a fixed reward model for learning the underlying value function. While Deep Neural Network (DNN)-based solutions perform well, they are still prone to overfitting, high sensitivity to parameter selection, and sample inefficiency. In this paper, an adaptive Kalman Filter (KF)-based framework is introduced as an efficient alternative to address the aforementioned problems by capitalizing on unique characteristics of KF such as uncertainty modeling and online second order learning. More specifically, the paper proposes the Multi-Agent Adaptive Kalman Temporal Difference (MAK-TD) framework and its Successor Representation-based variant, referred to as the MAK-SR. The proposed MAK-TD/SR frameworks consider the continuous nature of the action-space that is associated with high dimensional multi-agent environments and exploit Kalman Temporal Difference (KTD) to address the parameter uncertainty. The proposed MAK-TD/SR frameworks are evaluated via several experiments, which are implemented through the OpenAI Gym MARL benchmarks. In these experiments, different number of agents in cooperative, competitive, and mixed (cooperative-competitive) scenarios are utilized. The experimental results illustrate superior performance of the proposed MAK-TD/SR frameworks compared to their state-of-the-art counterparts.

摘要

分布式多智能体强化学习(MARL)算法的发展近来引发了越来越多的关注。一般而言,传统的基于模型(MB)或无模型(MF)的强化学习算法由于使用固定奖励模型来学习潜在价值函数,因而不能直接应用于MARL问题。虽然基于深度神经网络(DNN)的解决方案表现良好,但它们仍然容易出现过拟合、对参数选择高度敏感以及样本低效等问题。在本文中,引入了一种基于自适应卡尔曼滤波器(KF)的框架,作为一种有效的替代方案,通过利用KF的独特特性(如不确定性建模和在线二阶学习)来解决上述问题。更具体地说,本文提出了多智能体自适应卡尔曼时间差分(MAK-TD)框架及其基于后继表示的变体,称为MAK-SR。所提出的MAK-TD/SR框架考虑了与高维多智能体环境相关的动作空间的连续性,并利用卡尔曼时间差分(KTD)来解决参数不确定性问题。通过几个实验对所提出的MAK-TD/SR框架进行了评估,这些实验是通过OpenAI Gym MARL基准实现的。在这些实验中,在合作、竞争和混合(合作-竞争)场景中使用了不同数量的智能体。实验结果表明,与现有最先进的同类框架相比,所提出的MAK-TD/SR框架具有卓越的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/f50454e37265/sensors-22-01393-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/39418aa4e81a/sensors-22-01393-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/eb795575ad06/sensors-22-01393-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/ed2614d27d43/sensors-22-01393-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/5f6e57af2814/sensors-22-01393-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/0aa03441b34e/sensors-22-01393-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/f50454e37265/sensors-22-01393-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/39418aa4e81a/sensors-22-01393-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/eb795575ad06/sensors-22-01393-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/ed2614d27d43/sensors-22-01393-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/5f6e57af2814/sensors-22-01393-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/0aa03441b34e/sensors-22-01393-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/f50454e37265/sensors-22-01393-g006.jpg

相似文献

1
Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation.基于自适应卡尔曼时间差分和后继表示的多智能体强化学习
Sensors (Basel). 2022 Feb 11;22(4):1393. doi: 10.3390/s22041393.
2
Strangeness-driven exploration in multi-agent reinforcement learning.多智能体强化学习中的奇异驱动探索。
Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.
3
Reinforcement Learning-based Kalman Filter for Adaptive Brain Control in Brain-Machine Interface.基于强化学习的自适应脑控脑机接口卡尔曼滤波器。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:6619-6622. doi: 10.1109/EMBC46164.2021.9629511.
4
LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.LJIR:在合作多智能体强化学习中学习联合行动内在奖励
Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.
5
Multi-Agent Reinforcement Learning for Joint Cooperative Spectrum Sensing and Channel Access in Cognitive UAV Networks.多智能体强化学习在认知无人机网络中的联合协作频谱感知和信道接入。
Sensors (Basel). 2022 Feb 20;22(4):1651. doi: 10.3390/s22041651.
6
Cluster Kernel Reinforcement Learning-based Kalman Filter for Three-Lever Discrimination Task in Brain-Machine Interface.基于聚类核强化学习的脑机接口三杆判别任务中的卡尔曼滤波器。
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:690-693. doi: 10.1109/EMBC48229.2022.9871669.
7
Decentralized multi-agent reinforcement learning based on best-response policies.基于最佳响应策略的分布式多智能体强化学习
Front Robot AI. 2024 Apr 16;11:1229026. doi: 10.3389/frobt.2024.1229026. eCollection 2024.
8
Inference-Based Posteriori Parameter Distribution Optimization.基于推理的后验参数分布优化。
IEEE Trans Cybern. 2022 May;52(5):3006-3017. doi: 10.1109/TCYB.2020.3023127. Epub 2022 May 19.
9
A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning.基于深度强化学习的多智能体合作的分布视角
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4246-4259. doi: 10.1109/TNNLS.2022.3202097. Epub 2024 Feb 29.
10
Toward the biological model of the hippocampus as the successor representation agent.朝着海马体的生物模型作为后继表示代理的方向发展。
Biosystems. 2022 Mar;213:104612. doi: 10.1016/j.biosystems.2022.104612. Epub 2022 Jan 29.

引用本文的文献

1
Special Issue on Machine Learning and AI for Sensors.专刊:传感器的机器学习和人工智能
Sensors (Basel). 2023 Mar 3;23(5):2770. doi: 10.3390/s23052770.
2
Multi-Agent Credit Assignment and Bankruptcy Game for Improving Resource Allocation in Smart Cities.多主体信用分配和破产博弈在智能城市资源配置中的应用。
Sensors (Basel). 2023 Feb 6;23(4):1804. doi: 10.3390/s23041804.

本文引用的文献

1
A complementary learning systems approach to temporal difference learning.一种互补学习系统方法对时间差分学习。
Neural Netw. 2020 Feb;122:218-230. doi: 10.1016/j.neunet.2019.10.011. Epub 2019 Oct 26.
2
The successor representation in human reinforcement learning.人类强化学习中的后继表示
Nat Hum Behav. 2017 Sep;1(9):680-692. doi: 10.1038/s41562-017-0180-8. Epub 2017 Aug 28.
3
Off-Policy Interleaved Q -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems.离策略交错Q学习:仿射非线性离散时间系统的最优控制
IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1308-1320. doi: 10.1109/TNNLS.2018.2861945. Epub 2018 Sep 26.
4
Predictive representations can link model-based reinforcement learning to model-free mechanisms.预测性表征可以将基于模型的强化学习与无模型机制联系起来。
PLoS Comput Biol. 2017 Sep 25;13(9):e1005768. doi: 10.1371/journal.pcbi.1005768. eCollection 2017 Sep.
5
Multi-Sensor Fusion with Interaction Multiple Model and Chi-Square Test Tolerant Filter.基于交互多模型和卡方检验容差滤波器的多传感器融合
Sensors (Basel). 2016 Nov 2;16(11):1835. doi: 10.3390/s16111835.
6
Algorithmic survey of parametric value function approximation.参数值函数逼近的算法调查。
IEEE Trans Neural Netw Learn Syst. 2013 Jun;24(6):845-67. doi: 10.1109/TNNLS.2013.2247418.