用于对抗环境中无人作战情报规划的增强型Q学习与深度强化学习

Enhanced Q learning and deep reinforcement learning for unmanned combat intelligence planning in adversarial environments.

作者信息

Jianhong Xu, Gongqian Liang

机构信息

School of Equipment Management and UAV Engineering, Air Force Engineering University, Xi'an, 710051, Shaanxi, China.

School of Management, Northwest Polytechnic University, Xi'an, 710012, Shaanxi, China.

出版信息

Sci Rep. 2025 Aug 4;15(1):28364. doi: 10.1038/s41598-025-13752-3.

DOI:10.1038/s41598-025-13752-3

PMID:40760150

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12322001/

Abstract

This study proposes a multimodal deep reinforcement learning (MDRL) architecture, Multimodal Deep Reinforcement Learning-Deep Q-Network (MDRL-DQN), based on an improved Q-Learning algorithm. It aims to optimize Unmanned Aerial Vehicle (UAV) scheduling and execution capabilities in intelligent unmanned combat planning. By integrating an attention mechanism and an adaptive reward mechanism, the algorithm effectively fuses image data, sensor data, and intelligent information, enabling collaborative multimodal data processing. This improves task success rates, execution efficiency, and UAV deployment stability. Experimental results show that the improved MDRL-DQN algorithm demonstrates significant advantages in complex task scenarios. Specifically, in the long-distance dispersed defense (Scenario 1) and long-distance concentrated defense (Scenario 3), the task success rates reach 89.6% and 94.8%, respectively, outperforming other algorithms by several percentage points. Additionally, in Scenario 1, MDRL-DQN completes tasks in 720.8 s, which is 16.7% faster than Proximal Policy Optimization (PPO) at 865.3 s, highlighting its superior execution efficiency. These results indicate that the improved Q-Learning algorithm effectively enhances the efficiency and stability of unmanned combat tasks, providing new insights for intelligent planning in future unmanned operations.

摘要

本研究基于改进的Q学习算法，提出了一种多模态深度强化学习（MDRL）架构，即多模态深度强化学习深度Q网络（MDRL-DQN）。其目的是在智能无人作战规划中优化无人机的调度和执行能力。通过集成注意力机制和自适应奖励机制，该算法有效地融合了图像数据、传感器数据和智能信息，实现了协作式多模态数据处理。这提高了任务成功率、执行效率和无人机部署稳定性。实验结果表明，改进后的MDRL-DQN算法在复杂任务场景中表现出显著优势。具体而言，在远程分散防御（场景1）和远程集中防御（场景3）中，任务成功率分别达到89.6%和94.8%，比其他算法高出几个百分点。此外，在场景1中，MDRL-DQN在720.8秒内完成任务，比近端策略优化（PPO）的865.3秒快16.7%，凸显了其卓越的执行效率。这些结果表明，改进后的Q学习算法有效地提高了无人作战任务的效率和稳定性，为未来无人作战中的智能规划提供了新的思路。