• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于导弹规避与制导的分层强化学习方法。

A hierarchical reinforcement learning method for missile evasion and guidance.

作者信息

Yan Mengda, Yang Rennong, Zhang Ying, Yue Longfei, Hu Dongyuan

机构信息

School of Air Traffic Control and Navigation, Air Force Engineering University, Xian, 710051, China.

出版信息

Sci Rep. 2022 Nov 7;12(1):18888. doi: 10.1038/s41598-022-21756-6.

DOI:10.1038/s41598-022-21756-6
PMID:36344598
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9640633/
Abstract

This paper proposes an algorithm for missile manoeuvring based on a hierarchical proximal policy optimization (PPO) reinforcement learning algorithm, which enables a missile to guide to a target and evade an interceptor at the same time. Based on the idea of task hierarchy, the agent has a two-layer structure, in which low-level agents control basic actions and are controlled by a high-level agent. The low level has two agents called a guidance agent and an evasion agent, which are trained in simple scenarios and embedded in the high-level agent. The high level has a policy selector agent, which chooses one of the low-level agents to activate at each decision moment. The reward functions for each agent are different, considering the guidance accuracy, flight time, and energy consumption metrics, as well as a field-of-view constraint. Simulation shows that the PPO algorithm without a hierarchical structure cannot complete the task, while the hierarchical PPO algorithm has a 100% success rate on a test dataset. The agent shows good adaptability and strong robustness to the second-order lag of autopilot and measurement noises. Compared with a traditional guidance law, the reinforcement learning guidance law has satisfactory guidance accuracy and significant advantages in average time and average energy consumption.

摘要

本文提出了一种基于分层近端策略优化(PPO)强化学习算法的导弹机动算法,该算法能使导弹在引导至目标的同时躲避拦截器。基于任务分层的思想,智能体具有两层结构,其中低级智能体控制基本动作并由高级智能体控制。低级层有两个智能体,分别称为制导智能体和规避智能体,它们在简单场景中进行训练并嵌入到高级智能体中。高级层有一个策略选择智能体,它在每个决策时刻选择激活一个低级智能体。考虑到制导精度、飞行时间、能量消耗指标以及视场约束,每个智能体的奖励函数各不相同。仿真表明,无分层结构的PPO算法无法完成任务,而分层PPO算法在测试数据集上的成功率为100%。该智能体对自动驾驶仪的二阶滞后和测量噪声表现出良好的适应性和强大的鲁棒性。与传统制导律相比,强化学习制导律具有令人满意的制导精度,并且在平均时间和平均能量消耗方面具有显著优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/f5b78b9d2d90/41598_2022_21756_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/4ce96fb065db/41598_2022_21756_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/c6bc8d2d38fc/41598_2022_21756_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/c2b46e89e087/41598_2022_21756_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/370e9a2048e4/41598_2022_21756_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/19d2bb9afe70/41598_2022_21756_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/838dc1a5a42d/41598_2022_21756_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/953f07d78613/41598_2022_21756_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/2180c8ed8192/41598_2022_21756_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/775d54e5b34e/41598_2022_21756_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/47b7cf177705/41598_2022_21756_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/0ebf3f31bade/41598_2022_21756_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/53e1395f0f50/41598_2022_21756_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/bbd69e7966c3/41598_2022_21756_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/8f254dae4170/41598_2022_21756_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/6e10e2ceee19/41598_2022_21756_Fig15a_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/1598a211196f/41598_2022_21756_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/0cae784f920a/41598_2022_21756_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/8c968497d36d/41598_2022_21756_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/f5b78b9d2d90/41598_2022_21756_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/4ce96fb065db/41598_2022_21756_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/c6bc8d2d38fc/41598_2022_21756_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/c2b46e89e087/41598_2022_21756_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/370e9a2048e4/41598_2022_21756_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/19d2bb9afe70/41598_2022_21756_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/838dc1a5a42d/41598_2022_21756_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/953f07d78613/41598_2022_21756_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/2180c8ed8192/41598_2022_21756_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/775d54e5b34e/41598_2022_21756_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/47b7cf177705/41598_2022_21756_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/0ebf3f31bade/41598_2022_21756_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/53e1395f0f50/41598_2022_21756_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/bbd69e7966c3/41598_2022_21756_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/8f254dae4170/41598_2022_21756_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/6e10e2ceee19/41598_2022_21756_Fig15a_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/1598a211196f/41598_2022_21756_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/0cae784f920a/41598_2022_21756_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/8c968497d36d/41598_2022_21756_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/9640633/f5b78b9d2d90/41598_2022_21756_Fig19_HTML.jpg

相似文献

1
A hierarchical reinforcement learning method for missile evasion and guidance.一种用于导弹规避与制导的分层强化学习方法。
Sci Rep. 2022 Nov 7;12(1):18888. doi: 10.1038/s41598-022-21756-6.
2
Intelligent air defense task assignment based on hierarchical reinforcement learning.基于分层强化学习的智能防空任务分配
Front Neurorobot. 2022 Dec 1;16:1072887. doi: 10.3389/fnbot.2022.1072887. eCollection 2022.
3
Reinforcement Learning From Hierarchical Critics.
IEEE Trans Neural Netw Learn Syst. 2023 Feb;34(2):1066-1073. doi: 10.1109/TNNLS.2021.3103642. Epub 2023 Feb 3.
4
An Improved Distributed Sampling PPO Algorithm Based on Beta Policy for Continuous Global Path Planning Scheme.基于贝塔策略的改进分布式采样 PPO 算法在连续全局路径规划方案中的应用。
Sensors (Basel). 2023 Jul 2;23(13):6101. doi: 10.3390/s23136101.
5
Three-dimensional adaptive dynamic surface guidance law for missile with terminal angle and field-of-view constraints.具有终端角度和视场约束的导弹三维自适应动态面制导律
ISA Trans. 2024 Nov;154:113-131. doi: 10.1016/j.isatra.2024.08.006. Epub 2024 Aug 8.
6
An LEO Constellation Early Warning System Decision-Making Method Based on Hierarchical Reinforcement Learning.基于分层强化学习的低地球轨道星座预警系统决策方法。
Sensors (Basel). 2023 Feb 16;23(4):2225. doi: 10.3390/s23042225.
7
Proximal policy optimization-based reinforcement learning approach for DC-DC boost converter control: A comparative evaluation against traditional control techniques.基于近端策略优化的DC-DC升压变换器控制强化学习方法:与传统控制技术的比较评估。
Heliyon. 2024 Sep 11;10(18):e37823. doi: 10.1016/j.heliyon.2024.e37823. eCollection 2024 Sep 30.
8
Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment.复杂环境下基于深度强化学习的双臂机器人轨迹规划
Micromachines (Basel). 2022 Mar 31;13(4):564. doi: 10.3390/mi13040564.
9
Hierarchical Attention Master-Slave for heterogeneous multi-agent reinforcement learning.分层注意力主从式异构多智能体强化学习。
Neural Netw. 2023 May;162:359-368. doi: 10.1016/j.neunet.2023.02.037. Epub 2023 Mar 4.
10
Time-to-go based three-dimensional multi-missile spatio-temporal cooperative guidance law: A novel approach for maneuvering target interception.基于剩余时间的三维多弹时空协同制导律:一种用于机动目标拦截的新方法。
ISA Trans. 2024 Jun;149:178-195. doi: 10.1016/j.isatra.2024.04.017. Epub 2024 Apr 16.

引用本文的文献

1
Enhancing multi-UAV air combat decision making via hierarchical reinforcement learning.通过分层强化学习增强多无人机空战决策
Sci Rep. 2024 Feb 23;14(1):4458. doi: 10.1038/s41598-024-54938-5.