• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于共识通信的移动感知中的分散式政策协调。

Decentralized Policy Coordination in Mobile Sensing with Consensual Communication.

机构信息

School of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210046, China.

出版信息

Sensors (Basel). 2022 Dec 7;22(24):9584. doi: 10.3390/s22249584.

DOI:10.3390/s22249584
PMID:36559953
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9782600/
Abstract

In a typical mobile-sensing scenario, multiple autonomous vehicles cooperatively navigate to maximize the spatial-temporal coverage of the environment. However, as each vehicle can only make decentralized navigation decisions based on limited local observations, it is still a critical challenge to coordinate the vehicles for cooperation in an open, dynamic environment. In this paper, we propose a novel framework that incorporates consensual communication in multi-agent reinforcement learning for cooperative mobile sensing. At each step, the vehicles first learn to communicate with each other, and then, based on the received messages from others, navigate. Through communication, the decentralized vehicles can share information to break through the dilemma of local observation. Moreover, we utilize mutual information as a regularizer to promote consensus among the vehicles. The mutual information can enforce positive correlation between the navigation policy and the communication message, and therefore implicitly coordinate the decentralized policies. The convergence of this regularized algorithm can be proved theoretically under certain mild assumptions. In the experiments, we show that our algorithm is scalable and can converge very fast during training phase. It also outperforms other baselines significantly in the execution phase. The results validate that consensual communication plays very important role in coordinating the behaviors of decentralized vehicles.

摘要

在典型的移动感知场景中,多个自主车辆协同导航以最大限度地覆盖环境的时空范围。然而,由于每辆车只能根据有限的局部观测做出分散的导航决策,因此在开放、动态的环境中协调车辆进行合作仍然是一个关键挑战。在本文中,我们提出了一种新的框架,将共识通信纳入多智能体强化学习中,用于协同移动感知。在每一步中,车辆首先学会相互通信,然后根据从其他车辆接收到的消息进行导航。通过通信,分散的车辆可以共享信息,突破局部观测的困境。此外,我们利用互信息作为正则化项来促进车辆之间的共识。互信息可以强制导航策略和通信消息之间的正相关关系,从而隐式地协调分散的策略。在某些温和假设下,可以从理论上证明这个正则化算法的收敛性。在实验中,我们表明我们的算法是可扩展的,并且在训练阶段可以非常快速地收敛。它在执行阶段也明显优于其他基线。结果验证了共识通信在协调分散车辆的行为方面起着非常重要的作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/3f01e4362534/sensors-22-09584-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/415e33071665/sensors-22-09584-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/81dae27b9132/sensors-22-09584-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/8010c8eafe2b/sensors-22-09584-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/23e63622670c/sensors-22-09584-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/abb32454f0a1/sensors-22-09584-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/5e623a512fc2/sensors-22-09584-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/ed24e34607dc/sensors-22-09584-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/83aebaff93fc/sensors-22-09584-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/f7fc30f7e456/sensors-22-09584-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/3f01e4362534/sensors-22-09584-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/415e33071665/sensors-22-09584-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/81dae27b9132/sensors-22-09584-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/8010c8eafe2b/sensors-22-09584-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/23e63622670c/sensors-22-09584-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/abb32454f0a1/sensors-22-09584-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/5e623a512fc2/sensors-22-09584-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/ed24e34607dc/sensors-22-09584-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/83aebaff93fc/sensors-22-09584-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/f7fc30f7e456/sensors-22-09584-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/9782600/3f01e4362534/sensors-22-09584-g010.jpg

相似文献

1
Decentralized Policy Coordination in Mobile Sensing with Consensual Communication.基于共识通信的移动感知中的分散式政策协调。
Sensors (Basel). 2022 Dec 7;22(24):9584. doi: 10.3390/s22249584.
2
Coordination as inference in multi-agent reinforcement learning.多智能体强化学习中的协调作为推理。
Neural Netw. 2024 Apr;172:106101. doi: 10.1016/j.neunet.2024.106101. Epub 2024 Jan 11.
3
IHG-MA: Inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control.IHG-MA:用于多交叉口交通信号控制的归纳异质图多智能体强化学习。
Neural Netw. 2021 Jul;139:265-277. doi: 10.1016/j.neunet.2021.03.015. Epub 2021 Mar 22.
4
Optimistic sequential multi-agent reinforcement learning with motivational communication.带有激励性沟通的乐观序贯多智能体强化学习。
Neural Netw. 2024 Nov;179:106547. doi: 10.1016/j.neunet.2024.106547. Epub 2024 Jul 22.
5
Egoism, utilitarianism and egalitarianism in multi-agent reinforcement learning.多智能体强化学习中的利己主义、功利主义和平等主义。
Neural Netw. 2024 Oct;178:106544. doi: 10.1016/j.neunet.2024.106544. Epub 2024 Jul 24.
6
Decentralized Opportunistic Spectrum Resources Access Model and Algorithm toward Cooperative Ad-Hoc Networks.面向协作自组织网络的分布式机会频谱资源接入模型与算法
PLoS One. 2016 Jan 4;11(1):e0145526. doi: 10.1371/journal.pone.0145526. eCollection 2016.
7
HyperComm: Hypergraph-based communication in multi-agent reinforcement learning.超通讯:多智能体强化学习中的基于超图的通讯。
Neural Netw. 2024 Oct;178:106432. doi: 10.1016/j.neunet.2024.106432. Epub 2024 Jun 10.
8
Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration.在分散式多机器人探索中通过基于注意力的通信神经网络学习合作。
Entropy (Basel). 2019 Mar 19;21(3):294. doi: 10.3390/e21030294.
9
An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control.一种用于合作连续控制的离策略多智能体随机策略梯度算法。
Neural Netw. 2024 Feb;170:610-621. doi: 10.1016/j.neunet.2023.11.046. Epub 2023 Nov 23.
10
A traffic light control method based on multi-agent deep reinforcement learning algorithm.基于多智能体深度强化学习算法的交通信号灯控制方法。
Sci Rep. 2023 Jun 9;13(1):9396. doi: 10.1038/s41598-023-36606-2.

本文引用的文献

1
A Survey on Machine-Learning Techniques for UAV-Based Communications.基于无人机的通信机器学习技术研究综述。
Sensors (Basel). 2019 Nov 26;19(23):5170. doi: 10.3390/s19235170.
2
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
3
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
4
A unified analysis of value-function-based reinforcement- learning algorithms.基于价值函数的强化学习算法的统一分析。
Neural Comput. 1999 Nov 15;11(8):2017-59. doi: 10.1162/089976699300016070.