• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于匹配网络的无图导航学习奖励函数

Learning Reward Function with Matching Network for Mapless Navigation.

机构信息

Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116 China.

The School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.

出版信息

Sensors (Basel). 2020 Jun 30;20(13):3664. doi: 10.3390/s20133664.

DOI:10.3390/s20133664
PMID:32629934
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7374413/
Abstract

Deep reinforcement learning (DRL) has been successfully applied in mapless navigation. An important issue in DRL is to design a reward function for evaluating actions of agents. However, designing a robust and suitable reward function greatly depends on the designer's experience and intuition. To address this concern, we consider employing reward shaping from trajectories on similar navigation tasks without human supervision, and propose a general reward function based on matching network (MN). The MN-based reward function is able to gain the experience by pre-training through trajectories on different navigation tasks and accelerate the training speed of DRL in new tasks. The proposed reward function keeps the optimal strategy of DRL unchanged. The simulation results on two static maps show that the DRL converge with less iterations via the learned reward function than the state-of-the-art mapless navigation methods. The proposed method performs well in dynamic maps with partially moving obstacles. Even when test maps are different from training maps, the proposed strategy is able to complete the navigation tasks without additional training.

摘要

深度强化学习 (DRL) 在无地图导航中得到了成功应用。在 DRL 中,一个重要的问题是设计一个用于评估智能体动作的奖励函数。然而,设计一个稳健且合适的奖励函数在很大程度上取决于设计者的经验和直觉。为了解决这个问题,我们考虑在没有人工监督的情况下,从类似导航任务的轨迹中采用奖励塑造技术,并提出了一种基于匹配网络 (MN) 的通用奖励函数。基于 MN 的奖励函数能够通过在不同导航任务上的轨迹进行预训练来获取经验,并加速 DRL 在新任务中的训练速度。所提出的奖励函数保持 DRL 的最优策略不变。在两个静态地图上的仿真结果表明,通过学习奖励函数,DRL 能够在更少的迭代次数内收敛,优于最新的无地图导航方法。所提出的方法在部分移动障碍物的动态地图中表现良好。即使测试地图与训练地图不同,所提出的策略也能够在无需额外训练的情况下完成导航任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/95c5095d6df8/sensors-20-03664-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/6caa8754d8d9/sensors-20-03664-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/c9f235755b79/sensors-20-03664-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/04c0ced79b36/sensors-20-03664-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/098858329294/sensors-20-03664-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/f8915b713ceb/sensors-20-03664-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/c5acdc7550ab/sensors-20-03664-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/3913a517f04b/sensors-20-03664-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/09b9c64e5cec/sensors-20-03664-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/2b8ee0e08dee/sensors-20-03664-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/80ef8ff05e90/sensors-20-03664-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/6f577a79998a/sensors-20-03664-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/f83224b2bd97/sensors-20-03664-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/d6338636fa6d/sensors-20-03664-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/5902125da8f6/sensors-20-03664-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/95c5095d6df8/sensors-20-03664-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/6caa8754d8d9/sensors-20-03664-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/c9f235755b79/sensors-20-03664-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/04c0ced79b36/sensors-20-03664-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/098858329294/sensors-20-03664-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/f8915b713ceb/sensors-20-03664-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/c5acdc7550ab/sensors-20-03664-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/3913a517f04b/sensors-20-03664-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/09b9c64e5cec/sensors-20-03664-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/2b8ee0e08dee/sensors-20-03664-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/80ef8ff05e90/sensors-20-03664-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/6f577a79998a/sensors-20-03664-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/f83224b2bd97/sensors-20-03664-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/d6338636fa6d/sensors-20-03664-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/5902125da8f6/sensors-20-03664-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99f3/7374413/95c5095d6df8/sensors-20-03664-g015.jpg

相似文献

1
Learning Reward Function with Matching Network for Mapless Navigation.基于匹配网络的无图导航学习奖励函数
Sensors (Basel). 2020 Jun 30;20(13):3664. doi: 10.3390/s20133664.
2
Predictive hierarchical reinforcement learning for path-efficient mapless navigation with moving target.具有移动目标的无图路径高效导航的预测分层强化学习。
Neural Netw. 2023 Aug;165:677-688. doi: 10.1016/j.neunet.2023.06.007. Epub 2023 Jun 10.
3
The Impact of LiDAR Configuration on Goal-Based Navigation within a Deep Reinforcement Learning Framework.激光雷达配置对深度强化学习框架内基于目标的导航的影响
Sensors (Basel). 2023 Dec 9;23(24):9732. doi: 10.3390/s23249732.
4
Deep reinforcement learning for automated radiation adaptation in lung cancer.深度强化学习在肺癌放射自适应中的应用。
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
5
Mapless Path Planning for Mobile Robot Based on Improved Deep Deterministic Policy Gradient Algorithm.基于改进深度确定性策略梯度算法的移动机器人无地图路径规划
Sensors (Basel). 2024 Aug 30;24(17):5667. doi: 10.3390/s24175667.
6
Transformable Gaussian Reward Function for Socially Aware Navigation Using Deep Reinforcement Learning.用于基于深度强化学习的社会感知导航的可变换高斯奖励函数
Sensors (Basel). 2024 Jul 13;24(14):4540. doi: 10.3390/s24144540.
7
Leveraging Expert Demonstration Features for Deep Reinforcement Learning in Floor Cleaning Robot Navigation.利用专家演示特征进行地板清洁机器人导航中的深度强化学习。
Sensors (Basel). 2022 Oct 12;22(20):7750. doi: 10.3390/s22207750.
8
DRL-RNP: Deep Reinforcement Learning-Based Optimized RNP Flight Procedure Execution.基于深度强化学习的优化 RNP 飞行程序执行。
Sensors (Basel). 2022 Aug 28;22(17):6475. doi: 10.3390/s22176475.
9
Deep reinforcement learning-aided autonomous navigation with landmark generators.基于地标生成器的深度强化学习辅助自主导航。
Front Neurorobot. 2023 Aug 22;17:1200214. doi: 10.3389/fnbot.2023.1200214. eCollection 2023.
10
Learning Autonomous Navigation in Unmapped and Unknown Environments.在未映射和未知环境中学习自主导航。
Sensors (Basel). 2024 Sep 12;24(18):5925. doi: 10.3390/s24185925.

引用本文的文献

1
Inspection Robot Navigation Based on Improved TD3 Algorithm.基于改进TD3算法的巡检机器人导航
Sensors (Basel). 2024 Apr 15;24(8):2525. doi: 10.3390/s24082525.
2
Modeling Car-Following Behaviors and Driving Styles with Generative Adversarial Imitation Learning.基于生成对抗模仿学习的跟车行为和驾驶风格建模。
Sensors (Basel). 2020 Sep 4;20(18):5034. doi: 10.3390/s20185034.

本文引用的文献

1
Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV's Autonomous Motion Planning in Complex Unknown Environments.深度强化学习方法与多经验池在复杂未知环境中无人机自主运动规划。
Sensors (Basel). 2020 Mar 29;20(7):1890. doi: 10.3390/s20071890.
2
Learning Mobile Manipulation through Deep Reinforcement Learning.通过深度强化学习学习移动操作。
Sensors (Basel). 2020 Feb 10;20(3):939. doi: 10.3390/s20030939.