Suppr超能文献

基于深度Q网络协作框架的领导者-跟随者无人机编队控制

Leader-follower UAVs formation control based on a deep Q-network collaborative framework.

作者信息

Liu Zhijun, Li Jie, Shen Jian, Wang Xiaoguang, Chen Pengyun

机构信息

Shenzhen MSU-BIT University, Shenzhen, 518172, China.

School of Mechatronical Engineering, Beijing Institute of Technology, Beijing, 100081, China.

出版信息

Sci Rep. 2024 Feb 26;14(1):4674. doi: 10.1038/s41598-024-54531-w.

Abstract

This study examines a collaborative framework that utilizes an intelligent deep Q-network to regulate the formation of leader-follower Unmanned Aerial Vehicles (UAVs). The aim is to tackle the challenges posed by the highly dynamic and uncertain flight environment of UAVs. In the context of UAVs, we have developed a dynamic model that captures the collective state of the system. This model encompasses variables like as the relative positions, heading angle, rolling angle, and velocity of different nodes in the formation. In the subsequent section, we elucidate the operational procedure of UAVs in a collaborative manner, employing the conceptual framework of Markov Decision Process (MDP). Furthermore, we employ the Reinforcement Learning (RL) to facilitate this process. In light of this premise, a fundamental framework is presented for addressing the control problem of UAVs utilizing the DQN scheme. This framework encompasses a technique for action selection known as [Formula: see text]-imitation, as well as algorithmic specifics. Finally, the efficacy and portability of the DQN-based approach are substantiated by numerical simulation validation. The average reward curve demonstrates a satisfactory level of convergence, and kinematic link between the nodes inside the formation satisfies the essential requirements for the creation of a controller.

摘要

本研究考察了一种协作框架,该框架利用智能深度Q网络来调控无人机编队中领机-僚机的编队形式。目的是应对无人机高度动态且不确定的飞行环境所带来的挑战。在无人机的背景下,我们开发了一个动态模型来捕捉系统的集体状态。该模型包含编队中不同节点的相对位置、航向角、滚转角和速度等变量。在接下来的部分,我们采用马尔可夫决策过程(MDP)的概念框架,以协作方式阐明无人机的操作流程。此外,我们运用强化学习(RL)来推动这一过程。基于这一前提,提出了一个利用深度Q网络(DQN)方案解决无人机控制问题的基本框架。该框架包含一种称为[公式:见原文]-模仿的动作选择技术以及算法细节。最后,通过数值模拟验证证实了基于深度Q网络方法的有效性和可移植性。平均奖励曲线显示出令人满意的收敛水平,并且编队中节点之间的运动学联系满足创建控制器的基本要求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c75a/11316062/0e802d4672ce/41598_2024_54531_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验