• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度强化学习的微分对策追逃策略

Pursuit and Evasion Strategy of a Differential Game Based on Deep Reinforcement Learning.

作者信息

Xu Can, Zhang Yin, Wang Weigang, Dong Ligang

机构信息

School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China.

School of Information and Electronic Engineering, Sussex Artificial Intelligence Institute, Zhejiang Gongshang University, Hangzhou, China.

出版信息

Front Bioeng Biotechnol. 2022 Mar 22;10:827408. doi: 10.3389/fbioe.2022.827408. eCollection 2022.

DOI:10.3389/fbioe.2022.827408
PMID:35392407
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8980781/
Abstract

Since the emergence of deep neural network (DNN), it has achieved excellent performance in various research areas. As the combination of DNN and reinforcement learning, deep reinforcement learning (DRL) becomes a new paradigm for solving differential game problems. In this study, we build up a reinforcement learning environment and apply relevant DRL methods to a specific bio-inspired differential game problem: the dog sheep game. The dog sheep game environment is set on a circle where the dog chases down the sheep attempting to escape. According to some presuppositions, we are able to acquire the kinematic pursuit and evasion strategy. Next, this study implements the value-based deep Q network (DQN) model and the deep deterministic policy gradient (DDPG) model to the dog sheep game, attempting to endow the sheep the ability to escape successfully. To enhance the performance of the DQN model, this study brought up the reward mechanism with a time-out strategy and the game environment with an attenuation mechanism of the steering angle of sheep. These modifications effectively increase the probability of escape for the sheep. Furthermore, the DDPG model is adopted due to its continuous action space. Results show the modifications of the DQN model effectively increase the escape probabilities to the same level as the DDPG model. When it comes to the learning ability under various environment difficulties, the refined DQN and the DDPG models have bigger performance enhancement over the naive evasion model in harsh environments than in loose environments.

摘要

自深度神经网络(DNN)出现以来,它在各个研究领域都取得了优异的性能。作为DNN与强化学习的结合,深度强化学习(DRL)成为解决微分博弈问题的一种新范式。在本研究中,我们构建了一个强化学习环境,并将相关的DRL方法应用于一个特定的受生物启发的微分博弈问题:犬羊博弈。犬羊博弈环境设置在一个圆圈上,狗追逐试图逃跑的羊。根据一些预设,我们能够获得运动学上的追逐和逃避策略。接下来,本研究将基于价值的深度Q网络(DQN)模型和深度确定性策略梯度(DDPG)模型应用于犬羊博弈,试图赋予羊成功逃脱的能力。为了提高DQN模型的性能,本研究提出了带有超时策略的奖励机制以及具有羊转向角衰减机制的游戏环境。这些改进有效地提高了羊逃脱的概率。此外,由于DDPG模型具有连续动作空间,因此采用了该模型。结果表明,DQN模型的改进有效地将逃脱概率提高到了与DDPG模型相同的水平。在各种环境难度下的学习能力方面,在恶劣环境中,经过改进的DQN和DDPG模型比在宽松环境中相比朴素逃避模型有更大的性能提升。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/a524cc3584ac/fbioe-10-827408-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/7c8187ff768d/fbioe-10-827408-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/96efb0da42a4/fbioe-10-827408-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/31dc1cef8c7c/fbioe-10-827408-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/7fdfc64a32d1/fbioe-10-827408-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/348d586447ad/fbioe-10-827408-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/024c339edd67/fbioe-10-827408-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/60cfa1a37c95/fbioe-10-827408-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/a524cc3584ac/fbioe-10-827408-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/7c8187ff768d/fbioe-10-827408-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/96efb0da42a4/fbioe-10-827408-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/31dc1cef8c7c/fbioe-10-827408-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/7fdfc64a32d1/fbioe-10-827408-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/348d586447ad/fbioe-10-827408-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/024c339edd67/fbioe-10-827408-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/60cfa1a37c95/fbioe-10-827408-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/8980781/a524cc3584ac/fbioe-10-827408-g008.jpg

相似文献

1
Pursuit and Evasion Strategy of a Differential Game Based on Deep Reinforcement Learning.基于深度强化学习的微分对策追逃策略
Front Bioeng Biotechnol. 2022 Mar 22;10:827408. doi: 10.3389/fbioe.2022.827408. eCollection 2022.
2
Approximate Policy-Based Accelerated Deep Reinforcement Learning.基于近似策略的加速深度强化学习
IEEE Trans Neural Netw Learn Syst. 2020 Jun;31(6):1820-1830. doi: 10.1109/TNNLS.2019.2927227. Epub 2019 Aug 6.
3
Deep reinforcement learning for automated radiation adaptation in lung cancer.深度强化学习在肺癌放射自适应中的应用。
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
4
A Deep Reinforcement Learning-Based MPPT Control for PV Systems under Partial Shading Condition.一种基于深度强化学习的部分阴影条件下光伏系统最大功率点跟踪控制方法
Sensors (Basel). 2020 May 27;20(11):3039. doi: 10.3390/s20113039.
5
Application of Deep Reinforcement Learning to NS-SHAFT Game Signal Control.深度强化学习在 NS-SHAFT 游戏信号控制中的应用。
Sensors (Basel). 2022 Jul 14;22(14):5265. doi: 10.3390/s22145265.
6
Joint Beamforming, Power Allocation, and Splitting Control for SWIPT-Enabled IoT Networks with Deep Reinforcement Learning and Game Theory.基于深度强化学习和博弈论的支持同时无线信息与能量传输的物联网网络的联合波束成形、功率分配和分割控制
Sensors (Basel). 2022 Mar 17;22(6):2328. doi: 10.3390/s22062328.
7
Intelligent maneuver strategy for hypersonic vehicles in three-player pursuit-evasion games via deep reinforcement learning.基于深度强化学习的三人追逃博弈中高超音速飞行器智能机动策略
Front Neurosci. 2024 Feb 14;18:1362303. doi: 10.3389/fnins.2024.1362303. eCollection 2024.
8
Deep deterministic policy gradient algorithm: A systematic review.深度确定性策略梯度算法:一项系统综述。
Heliyon. 2024 May 7;10(9):e30697. doi: 10.1016/j.heliyon.2024.e30697. eCollection 2024 May 15.
9
An Improved Approach towards Multi-Agent Pursuit-Evasion Game Decision-Making Using Deep Reinforcement Learning.一种使用深度强化学习改进多智能体追逃博弈决策的方法。
Entropy (Basel). 2021 Oct 29;23(11):1433. doi: 10.3390/e23111433.
10
Path-Tracking Control Strategy of Unmanned Vehicle Based on DDPG Algorithm.基于深度确定性策略梯度算法的无人驾驶车辆路径跟踪控制策略
Sensors (Basel). 2022 Oct 17;22(20):7881. doi: 10.3390/s22207881.

引用本文的文献

1
Deep reinforcement learning enables adaptive-image augmentation for automated optical inspection of plant rust.深度强化学习实现了用于植物锈病自动光学检测的自适应图像增强。
Front Plant Sci. 2023 Jul 7;14:1142957. doi: 10.3389/fpls.2023.1142957. eCollection 2023.

本文引用的文献

1
A Tandem Robotic Arm Inverse Kinematic Solution Based on an Improved Particle Swarm Algorithm.一种基于改进粒子群算法的串联机器人手臂逆运动学求解方法。
Front Bioeng Biotechnol. 2022 May 19;10:832829. doi: 10.3389/fbioe.2022.832829. eCollection 2022.
2
Self-Tuning Control of Manipulator Positioning Based on Fuzzy PID and PSO Algorithm.基于模糊PID和粒子群算法的机械手定位自整定控制
Front Bioeng Biotechnol. 2022 Feb 11;9:817723. doi: 10.3389/fbioe.2021.817723. eCollection 2021.
3
Intelligent Detection of Steel Defects Based on Improved Split Attention Networks.
基于改进型分割注意力网络的钢材缺陷智能检测
Front Bioeng Biotechnol. 2022 Jan 13;9:810876. doi: 10.3389/fbioe.2021.810876. eCollection 2021.
4
Genetic Algorithm-Based Trajectory Optimization for Digital Twin Robots.基于遗传算法的数字孪生机器人轨迹优化
Front Bioeng Biotechnol. 2022 Jan 10;9:793782. doi: 10.3389/fbioe.2021.793782. eCollection 2021.
5
An Improved Approach towards Multi-Agent Pursuit-Evasion Game Decision-Making Using Deep Reinforcement Learning.一种使用深度强化学习改进多智能体追逃博弈决策的方法。
Entropy (Basel). 2021 Oct 29;23(11):1433. doi: 10.3390/e23111433.
6
Dynamic Gesture Recognition Using Surface EMG Signals Based on Multi-Stream Residual Network.基于多流残差网络的表面肌电信号动态手势识别
Front Bioeng Biotechnol. 2021 Oct 22;9:779353. doi: 10.3389/fbioe.2021.779353. eCollection 2021.
7
Combining Public Opinion Dissemination with Polarization Process Considering Individual Heterogeneity.结合考虑个体异质性的舆论传播与两极分化过程。
Healthcare (Basel). 2021 Feb 7;9(2):176. doi: 10.3390/healthcare9020176.
8
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.