• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于带有生存惩罚函数的深度强化学习的端到端自主导航

End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function.

作者信息

Jeng Shyr-Long, Chiang Chienhsun

机构信息

Department of Mechanical Engineering, Lunghwa University of Science and Technology, Taoyuan City 333326, Taiwan.

Department of Mechanical Engineering, National Yang Ming Chiao Tung University, Hsinchu City 300093, Taiwan.

出版信息

Sensors (Basel). 2023 Oct 23;23(20):8651. doi: 10.3390/s23208651.

DOI:10.3390/s23208651
PMID:37896743
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10610759/
Abstract

An end-to-end approach to autonomous navigation that is based on deep reinforcement learning (DRL) with a survival penalty function is proposed in this paper. Two actor-critic (AC) frameworks, namely, deep deterministic policy gradient (DDPG) and twin-delayed DDPG (TD3), are employed to enable a nonholonomic wheeled mobile robot (WMR) to perform navigation in dynamic environments containing obstacles and for which no maps are available. A comprehensive reward based on the survival penalty function is introduced; this approach effectively solves the sparse reward problem and enables the WMR to move toward its target. Consecutive episodes are connected to increase the cumulative penalty for scenarios involving obstacles; this method prevents training failure and enables the WMR to plan a collision-free path. Simulations are conducted for four scenarios-movement in an obstacle-free space, in a parking lot, at an intersection without and with a central obstacle, and in a multiple obstacle space-to demonstrate the efficiency and operational safety of our method. For the same navigation environment, compared with the DDPG algorithm, the TD3 algorithm exhibits faster numerical convergence and higher stability in the training phase, as well as a higher task execution success rate in the evaluation phase.

摘要

本文提出了一种基于深度强化学习(DRL)并带有生存惩罚函数的端到端自主导航方法。采用了两种演员-评论家(AC)框架,即深度确定性策略梯度(DDPG)和双延迟DDPG(TD3),以使非完整轮式移动机器人(WMR)能够在包含障碍物且无地图可用的动态环境中执行导航。引入了基于生存惩罚函数的综合奖励;这种方法有效地解决了稀疏奖励问题,并使WMR能够朝着目标移动。连续的情节相互关联,以增加涉及障碍物场景的累积惩罚;该方法可防止训练失败,并使WMR能够规划无碰撞路径。针对四种场景进行了仿真——在无障碍空间、停车场、有无中央障碍物的十字路口以及多障碍物空间中的移动——以证明我们方法的效率和操作安全性。对于相同的导航环境,与DDPG算法相比,TD3算法在训练阶段表现出更快的数值收敛速度和更高的稳定性,以及在评估阶段更高的任务执行成功率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/ba5c5d666de1/sensors-23-08651-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/88ff6f7d9d02/sensors-23-08651-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/939fd38eaa2a/sensors-23-08651-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/d3228dc46770/sensors-23-08651-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/c75f7669069d/sensors-23-08651-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/02ff4bfaca13/sensors-23-08651-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/12a65ed4b31c/sensors-23-08651-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/acd51bce9fd1/sensors-23-08651-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/30577f27c6bb/sensors-23-08651-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/2067f9606f44/sensors-23-08651-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/6d4441e571c4/sensors-23-08651-g010a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/0b2ef577c2ed/sensors-23-08651-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/e8fe73e3f729/sensors-23-08651-g012a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/ba5c5d666de1/sensors-23-08651-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/88ff6f7d9d02/sensors-23-08651-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/939fd38eaa2a/sensors-23-08651-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/d3228dc46770/sensors-23-08651-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/c75f7669069d/sensors-23-08651-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/02ff4bfaca13/sensors-23-08651-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/12a65ed4b31c/sensors-23-08651-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/acd51bce9fd1/sensors-23-08651-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/30577f27c6bb/sensors-23-08651-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/2067f9606f44/sensors-23-08651-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/6d4441e571c4/sensors-23-08651-g010a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/0b2ef577c2ed/sensors-23-08651-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/e8fe73e3f729/sensors-23-08651-g012a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e34c/10610759/ba5c5d666de1/sensors-23-08651-g013.jpg

相似文献

1
End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function.基于带有生存惩罚函数的深度强化学习的端到端自主导航
Sensors (Basel). 2023 Oct 23;23(20):8651. doi: 10.3390/s23208651.
2
Deep Reinforcement Learning-Based Accurate Control of Planetary Soft Landing.基于深度强化学习的行星软着陆精确控制
Sensors (Basel). 2021 Dec 6;21(23):8161. doi: 10.3390/s21238161.
3
Deep Deterministic Policy Gradient-Based Autonomous Driving for Mobile Robots in Sparse Reward Environments.基于深度确定性策略梯度的稀疏奖励环境下移动机器人自主驾驶。
Sensors (Basel). 2022 Dec 7;22(24):9574. doi: 10.3390/s22249574.
4
Deep reinforcement learning-aided autonomous navigation with landmark generators.基于地标生成器的深度强化学习辅助自主导航。
Front Neurorobot. 2023 Aug 22;17:1200214. doi: 10.3389/fnbot.2023.1200214. eCollection 2023.
5
Predictive hierarchical reinforcement learning for path-efficient mapless navigation with moving target.具有移动目标的无图路径高效导航的预测分层强化学习。
Neural Netw. 2023 Aug;165:677-688. doi: 10.1016/j.neunet.2023.06.007. Epub 2023 Jun 10.
6
Deep deterministic policy gradient algorithm: A systematic review.深度确定性策略梯度算法:一项系统综述。
Heliyon. 2024 May 7;10(9):e30697. doi: 10.1016/j.heliyon.2024.e30697. eCollection 2024 May 15.
7
Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient.基于深度确定性策略梯度的移动机器人高效路径规划。
Sensors (Basel). 2022 May 8;22(9):3579. doi: 10.3390/s22093579.
8
Reinforcement learning-based dynamic obstacle avoidance and integration of path planning.基于强化学习的动态避障与路径规划集成
Intell Serv Robot. 2021;14(5):663-677. doi: 10.1007/s11370-021-00387-2. Epub 2021 Oct 6.
9
Deep Reinforcement Learning for Autonomous Driving with an Auxiliary Actor Discriminator.基于辅助行动者判别器的自动驾驶深度强化学习
Sensors (Basel). 2024 Jan 22;24(2):700. doi: 10.3390/s24020700.
10
SLP-Improved DDPG Path-Planning Algorithm for Mobile Robot in Large-Scale Dynamic Environment.基于 SLP 的改进 DDPG 路径规划算法在大规模动态环境下的移动机器人应用
Sensors (Basel). 2023 Mar 28;23(7):3521. doi: 10.3390/s23073521.

本文引用的文献

1
Deep Deterministic Policy Gradient-Based Autonomous Driving for Mobile Robots in Sparse Reward Environments.基于深度确定性策略梯度的稀疏奖励环境下移动机器人自主驾驶。
Sensors (Basel). 2022 Dec 7;22(24):9574. doi: 10.3390/s22249574.
2
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control for Mobile Robot in Unknown Environment.未知环境下基于深度强化学习的移动机器人快速轨迹规划与控制的设计及实验验证
IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):5778-5792. doi: 10.1109/TNNLS.2022.3209154. Epub 2024 Apr 4.
3
Deep Reinforcement Learning on Autonomous Driving Policy With Auxiliary Critic Network.
基于辅助评论家网络的自动驾驶策略深度强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3680-3690. doi: 10.1109/TNNLS.2021.3116063. Epub 2023 Jul 6.
4
Reinforcement learning-based dynamic obstacle avoidance and integration of path planning.基于强化学习的动态避障与路径规划集成
Intell Serv Robot. 2021;14(5):663-677. doi: 10.1007/s11370-021-00387-2. Epub 2021 Oct 6.
5
End-to-End AUV Motion Planning Method Based on Soft Actor-Critic.基于软动作 - 批评家的端到端 AUV 运动规划方法。
Sensors (Basel). 2021 Sep 1;21(17):5893. doi: 10.3390/s21175893.
6
Deep Reinforcement Learning for Indoor Mobile Robot Path Planning.深度强化学习在室内移动机器人路径规划中的应用。
Sensors (Basel). 2020 Sep 25;20(19):5493. doi: 10.3390/s20195493.
7
Reinforcement Learning-Based End-to-End Parking for Automatic Parking System.基于强化学习的全自动泊车系统端到端泊车
Sensors (Basel). 2019 Sep 16;19(18):3996. doi: 10.3390/s19183996.