Yan Peng, Jia Tao, Bai Chengchao
School of Astronautics, Harbin Institute of Technology, Harbin 150001, China.
Aerospace Technology Research Institute, China Aerodynamics Research and Development Center, Mianyang 621000, China.
Sensors (Basel). 2021 Feb 4;21(4):1076. doi: 10.3390/s21041076.
Unmanned aerial vehicles (UAVs) have been widely used in search and rescue (SAR) missions due to their high flexibility. A key problem in SAR missions is to search and track moving targets in an area of interest. In this paper, we focus on the problem of Cooperative Multi-UAV Observation of Multiple Moving Targets (CMUOMMT). In contrast to the existing literature, we not only optimize the average observation rate of the discovered targets, but we also emphasize the fairness of the observation of the discovered targets and the continuous exploration of the undiscovered targets, under the assumption that the total number of targets is unknown. To achieve this objective, a deep reinforcement learning (DRL)-based method is proposed under the Partially Observable Markov Decision Process (POMDP) framework, where each UAV maintains four observation history maps, and maps from different UAVs within a communication range can be merged to enhance UAVs' awareness of the environment. A deep convolutional neural network (CNN) is used to process the merged maps and generate the control commands to UAVs. The simulation results show that our policy can enable UAVs to balance between giving the discovered targets a fair observation and exploring the search region compared with other methods.
由于具有高度灵活性,无人机已在搜索和救援(SAR)任务中得到广泛应用。SAR任务中的一个关键问题是在感兴趣区域内搜索和跟踪移动目标。在本文中,我们关注多移动目标协同多无人机观测(CMUOMMT)问题。与现有文献不同,在目标总数未知的假设下,我们不仅优化已发现目标的平均观测率,还强调已发现目标观测的公平性以及对未发现目标的持续探索。为实现这一目标,在部分可观测马尔可夫决策过程(POMDP)框架下提出了一种基于深度强化学习(DRL)的方法,其中每架无人机维护四张观测历史地图,通信范围内不同无人机的地图可合并以增强无人机对环境的感知。使用深度卷积神经网络(CNN)处理合并后的地图并生成无人机的控制指令。仿真结果表明,与其他方法相比,我们的策略能使无人机在公平观测已发现目标和探索搜索区域之间取得平衡。