• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于对抗环境中无人作战情报规划的增强型Q学习与深度强化学习

Enhanced Q learning and deep reinforcement learning for unmanned combat intelligence planning in adversarial environments.

作者信息

Jianhong Xu, Gongqian Liang

机构信息

School of Equipment Management and UAV Engineering, Air Force Engineering University, Xi'an, 710051, Shaanxi, China.

School of Management, Northwest Polytechnic University, Xi'an, 710012, Shaanxi, China.

出版信息

Sci Rep. 2025 Aug 4;15(1):28364. doi: 10.1038/s41598-025-13752-3.

DOI:10.1038/s41598-025-13752-3
PMID:40760150
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12322001/
Abstract

This study proposes a multimodal deep reinforcement learning (MDRL) architecture, Multimodal Deep Reinforcement Learning-Deep Q-Network (MDRL-DQN), based on an improved Q-Learning algorithm. It aims to optimize Unmanned Aerial Vehicle (UAV) scheduling and execution capabilities in intelligent unmanned combat planning. By integrating an attention mechanism and an adaptive reward mechanism, the algorithm effectively fuses image data, sensor data, and intelligent information, enabling collaborative multimodal data processing. This improves task success rates, execution efficiency, and UAV deployment stability. Experimental results show that the improved MDRL-DQN algorithm demonstrates significant advantages in complex task scenarios. Specifically, in the long-distance dispersed defense (Scenario 1) and long-distance concentrated defense (Scenario 3), the task success rates reach 89.6% and 94.8%, respectively, outperforming other algorithms by several percentage points. Additionally, in Scenario 1, MDRL-DQN completes tasks in 720.8 s, which is 16.7% faster than Proximal Policy Optimization (PPO) at 865.3 s, highlighting its superior execution efficiency. These results indicate that the improved Q-Learning algorithm effectively enhances the efficiency and stability of unmanned combat tasks, providing new insights for intelligent planning in future unmanned operations.

摘要

本研究基于改进的Q学习算法,提出了一种多模态深度强化学习(MDRL)架构,即多模态深度强化学习深度Q网络(MDRL-DQN)。其目的是在智能无人作战规划中优化无人机的调度和执行能力。通过集成注意力机制和自适应奖励机制,该算法有效地融合了图像数据、传感器数据和智能信息,实现了协作式多模态数据处理。这提高了任务成功率、执行效率和无人机部署稳定性。实验结果表明,改进后的MDRL-DQN算法在复杂任务场景中表现出显著优势。具体而言,在远程分散防御(场景1)和远程集中防御(场景3)中,任务成功率分别达到89.6%和94.8%,比其他算法高出几个百分点。此外,在场景1中,MDRL-DQN在720.8秒内完成任务,比近端策略优化(PPO)的865.3秒快16.7%,凸显了其卓越的执行效率。这些结果表明,改进后的Q学习算法有效地提高了无人作战任务的效率和稳定性,为未来无人作战中的智能规划提供了新的思路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/0f0cfdb85406/41598_2025_13752_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/f7353eba9c2c/41598_2025_13752_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/88fdbaa82463/41598_2025_13752_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/a26df1e103e8/41598_2025_13752_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/a55444000972/41598_2025_13752_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/206764ef98b9/41598_2025_13752_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/d5f6806df225/41598_2025_13752_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/401c7027ac0b/41598_2025_13752_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/5ee52426b41b/41598_2025_13752_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/0f0cfdb85406/41598_2025_13752_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/f7353eba9c2c/41598_2025_13752_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/88fdbaa82463/41598_2025_13752_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/a26df1e103e8/41598_2025_13752_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/a55444000972/41598_2025_13752_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/206764ef98b9/41598_2025_13752_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/d5f6806df225/41598_2025_13752_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/401c7027ac0b/41598_2025_13752_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/5ee52426b41b/41598_2025_13752_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8712/12322001/0f0cfdb85406/41598_2025_13752_Fig9_HTML.jpg

相似文献

1
Enhanced Q learning and deep reinforcement learning for unmanned combat intelligence planning in adversarial environments.用于对抗环境中无人作战情报规划的增强型Q学习与深度强化学习
Sci Rep. 2025 Aug 4;15(1):28364. doi: 10.1038/s41598-025-13752-3.
2
An unmanned intelligent inspection technology based on improved reinforcement learning algorithm for power large-area multi-scene inspection.一种基于改进强化学习算法的电力大面积多场景巡检无人智能巡检技术。
Sci Rep. 2025 Jul 10;15(1):24933. doi: 10.1038/s41598-025-10121-y.
3
Multi class aerial image classification in UAV networks employing Snake Optimization Algorithm with Deep Learning.基于深度学习并采用蛇优化算法的无人机网络中的多类航空图像分类
Sci Rep. 2025 Jul 4;15(1):23872. doi: 10.1038/s41598-025-04570-8.
4
Design of a dynamic trust management and defense decision system for shared vehicle data based on blockchain and deep reinforcement learning.基于区块链和深度强化学习的共享车辆数据动态信任管理与防御决策系统设计
Sci Rep. 2025 Jul 22;15(1):26662. doi: 10.1038/s41598-025-11511-y.
5
Research of UAV 3D path planning based on improved Dwarf mongoose algorithm with multiple strategies.基于改进的多策略侏儒 mongoose 算法的无人机三维路径规划研究
Sci Rep. 2025 Jul 24;15(1):26979. doi: 10.1038/s41598-025-11492-y.
6
Improved double DQN with deep reinforcement learning for UAV indoor autonomous obstacle avoidance.基于深度强化学习的改进双深度Q网络用于无人机室内自主避障
Sci Rep. 2025 Aug 1;15(1):28133. doi: 10.1038/s41598-025-02356-6.
7
DRL-Driven Intelligent SFC Deployment in MEC Workload for Dynamic IoT Networks.用于动态物联网网络的MEC工作负载中基于深度强化学习驱动的智能软件定义网络编排部署
Sensors (Basel). 2025 Jul 8;25(14):4257. doi: 10.3390/s25144257.
8
MPN-RRT*: A New Method in 3D Urban Path Planning for UAV Integrating Deep Learning and Sampling Optimization.MPN-RRT*:一种融合深度学习与采样优化的无人机三维城市路径规划新方法。
Sensors (Basel). 2025 Jul 2;25(13):4142. doi: 10.3390/s25134142.
9
Proximal Policy Optimization-based Task Offloading Framework for Smart Disaster Monitoring using UAV-assisted WSNs.基于近端策略优化的无人机辅助无线传感器网络智能灾害监测任务卸载框架
MethodsX. 2025 Jun 26;15:103472. doi: 10.1016/j.mex.2025.103472. eCollection 2025 Dec.
10
Accurate recognition of UAVs on multi-scenario perception with YOLOv9-CAG.使用YOLOv9-CAG在多场景感知中对无人机进行准确识别。
Sci Rep. 2025 Jul 30;15(1):27755. doi: 10.1038/s41598-025-12670-8.

本文引用的文献

1
Multi-UAV Collaborative Search and Attack Mission Decision-Making in Unknown Environments.未知环境下多无人机协同搜索与攻击任务决策
Sensors (Basel). 2023 Aug 24;23(17):7398. doi: 10.3390/s23177398.
2
Research on reinforcement learning-based safe decision-making methodology for multiple unmanned aerial vehicles.基于强化学习的多无人机安全决策方法研究
Front Neurorobot. 2023 Jan 10;16:1105480. doi: 10.3389/fnbot.2022.1105480. eCollection 2022.
3
A novel metaheuristics with adaptive neuro-fuzzy inference system for decision making on autonomous unmanned aerial vehicle systems.
一种用于自主无人机系统决策的基于自适应神经模糊推理系统的新型元启发式算法。
ISA Trans. 2023 Jan;132:16-23. doi: 10.1016/j.isatra.2022.04.006. Epub 2022 Apr 13.