基于 PPO-GIC 算法与 CNN-LSTM 融合网络的多无人机自主避撞

Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN-LSTM fusion network.

机构信息

School of Computer and Information, Hohai University, Nanjing 210098, People's Republic of China; School of Artificial Intelligence, Hohai University, Nanjing 210098, People's Republic of China.

College of Science, Hohai University, Nanjing 210098, People's Republic of China.

出版信息

Neural Netw. 2023 May;162:21-33. doi: 10.1016/j.neunet.2023.02.027. Epub 2023 Feb 24.

DOI:10.1016/j.neunet.2023.02.027

PMID:36878168

Abstract

This paper is concerned with the autonomous effective collision avoidance strategy for multiple unmanned aerial vehicles (multi-UAV) in limited airspace under the framework of proximal policy optimization (PPO) algorithm. An end-to-end deep reinforcement learning (DRL) control strategy and a potential-based reward function are designed. Next, the CNN-LSTM (CL) fusion network is constructed by fusing the convolutional neural network (CNN) and the long short-term memory network (LSTM), which realizes the feature interaction among the information of multi-UAV. Then, a generalized integral compensator (GIC) is introduced into the actor-critic structure, and the CLPPO-GIC algorithm is proposed by combining CL and GIC. Finally, we validate the learned policy in various simulation environments by performance evaluation. The simulation results show that the introduction of the LSTM network and GIC can further improve the efficiency of collision avoidance, and the robustness and accuracy of the algorithm are verified in different environments.

摘要

本文研究了在近端策略优化（PPO）算法框架下，有限空域内多架无人机（多 UAV）的自主有效避撞策略。设计了一种端到端的深度强化学习（DRL）控制策略和基于势的奖励函数。接下来，通过融合卷积神经网络（CNN）和长短时记忆网络（LSTM）构建 CNN-LSTM（CL）融合网络，实现多 UAV 信息之间的特征交互。然后，在 Actor-Critic 结构中引入广义积分补偿器（GIC），并通过结合 CL 和 GIC 提出了 CLPPO-GIC 算法。最后，通过性能评估在各种仿真环境中验证了所学习的策略。仿真结果表明，LSTM 网络和 GIC 的引入可以进一步提高避撞效率，并且在不同环境中验证了算法的鲁棒性和准确性。