Suppr超能文献

用于协同导航控制的分层稳定多智能体强化学习

Hierarchical and Stable Multiagent Reinforcement Learning for Cooperative Navigation Control.

作者信息

Jin Yue, Wei Shuangqing, Yuan Jian, Zhang Xudong

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):90-103. doi: 10.1109/TNNLS.2021.3089834. Epub 2023 Jan 5.

Abstract

We solve an important and challenging cooperative navigation control problem, Multiagent Navigation to Unassigned Multiple targets (MNUM) in unknown environments with minimal time and without collision. Conventional methods are based on multiagent path planning that requires building an environment map and expensive real-time path planning computations. In this article, we formulate MNUM as a stochastic game and devise a novel multiagent deep reinforcement learning (MADRL) algorithm to learn an end-to-end solution, which directly maps raw sensor data to control signals. Once learned, the policy can be deployed onto each agent, and thereby, the expensive online planning computations can be offloaded. However, to solve MNUM, traditional MADRL suffers from large policy solution space and nonstationary environment when agents make decisions independently and concurrently. Accordingly, we propose a hierarchical and stable MADRL algorithm. The hierarchical learning part introduces a two-layer policy model to reduce the solution space and uses an interlaced learning paradigm to learn two coupled policies. In the stable learning part, we propose to learn an extended action-value function that implicitly incorporates estimations of other agents' actions, based on which the environment's nonstationarity caused by other agents' changing policies can be alleviated. Extensive experiments demonstrate that our method can converge in a fast way and generate more efficient cooperative navigation policies than comparable methods.

摘要

我们解决了一个重要且具有挑战性的协同导航控制问题,即在未知环境中以最短时间且无碰撞地实现多智能体导航至未分配的多个目标(MNUM)。传统方法基于多智能体路径规划,这需要构建环境地图并进行昂贵的实时路径规划计算。在本文中,我们将MNUM表述为一个随机博弈,并设计了一种新颖的多智能体深度强化学习(MADRL)算法来学习端到端的解决方案,该方案直接将原始传感器数据映射为控制信号。一旦学习完成,策略就可以部署到每个智能体上,从而可以卸载昂贵的在线规划计算。然而,为了解决MNUM,传统的MADRL在智能体独立并发决策时会面临策略解空间大以及环境非平稳的问题。因此,我们提出了一种分层且稳定的MADRL算法。分层学习部分引入了一个两层策略模型来减少解空间,并使用交错学习范式来学习两个耦合策略。在稳定学习部分,我们提议学习一个扩展的动作值函数,该函数隐式地纳入了对其他智能体动作的估计,基于此可以缓解其他智能体策略变化所导致的环境非平稳性。大量实验表明,我们的方法能够快速收敛,并且比同类方法生成更高效的协同导航策略。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验