• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有延迟奖励的结构保持模仿学习:在机器人世界杯足球2D模拟环境中的评估

Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment.

作者信息

Nguyen Quang Dang, Prokopenko Mikhail

机构信息

Centre for Complex Systems, Faculty of Engineering, University of Sydney, Sydney, NSW, Australia.

出版信息

Front Robot AI. 2020 Sep 16;7:123. doi: 10.3389/frobt.2020.00123. eCollection 2020.

DOI:10.3389/frobt.2020.00123
PMID:33501289
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7805756/
Abstract

We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter learning. The proposed solution is shown to be feasible for replacing a selected behavioral module in a well-established RoboCup base team, , in which behavioral modules have been evolved with human experts in the loop. Furthermore, we introduce an additional performance-correlated signal (a delayed reward signal), enabling a search for local maxima during a training phase. The extension is compared against a known benchmark. Finally, we investigate the extent to which preserving the structure of expert-designed behaviors affects the performance of a neural network-based solution.

摘要

我们描述并评估了一种基于神经网络的架构,旨在模仿并提升在RoboCup足球2D模拟环境中全自主足球队的表现。该方法利用深度Q网络架构进行动作判定,并使用深度神经网络进行参数学习。结果表明,所提出的解决方案对于替换一支成熟的RoboCup基础球队中选定的行为模块是可行的,在该基础球队中,行为模块是在人类专家参与的情况下逐步演化而来的。此外,我们引入了一个额外的与性能相关的信号(延迟奖励信号),以便在训练阶段搜索局部最大值。将该扩展与一个已知基准进行了比较。最后,我们研究了保留专家设计行为的结构在多大程度上会影响基于神经网络的解决方案的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/a7b945c19113/frobt-07-00123-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/1d1e1005ebb1/frobt-07-00123-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/95581f30f9df/frobt-07-00123-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/33d3e20b5bc3/frobt-07-00123-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/405140e1f31c/frobt-07-00123-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/9dc3f68f3e88/frobt-07-00123-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/e5ccc3ecdcb5/frobt-07-00123-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/cb3443852bd2/frobt-07-00123-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/055bbf97ab81/frobt-07-00123-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/37d490047de2/frobt-07-00123-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/a7b945c19113/frobt-07-00123-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/1d1e1005ebb1/frobt-07-00123-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/95581f30f9df/frobt-07-00123-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/33d3e20b5bc3/frobt-07-00123-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/405140e1f31c/frobt-07-00123-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/9dc3f68f3e88/frobt-07-00123-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/e5ccc3ecdcb5/frobt-07-00123-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/cb3443852bd2/frobt-07-00123-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/055bbf97ab81/frobt-07-00123-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/37d490047de2/frobt-07-00123-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a68/7805756/a7b945c19113/frobt-07-00123-g0010.jpg

相似文献

1
Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment.具有延迟奖励的结构保持模仿学习:在机器人世界杯足球2D模拟环境中的评估
Front Robot AI. 2020 Sep 16;7:123. doi: 10.3389/frobt.2020.00123. eCollection 2020.
2
Deep reinforcement learning for automated radiation adaptation in lung cancer.深度强化学习在肺癌放射自适应中的应用。
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
3
Modular deep reinforcement learning from reward and punishment for robot navigation.基于奖惩的机器人导航模块化深度强化学习。
Neural Netw. 2021 Mar;135:115-126. doi: 10.1016/j.neunet.2020.12.001. Epub 2020 Dec 8.
4
Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules.用于异构学习模块混合的合作与竞争强化及模仿学习
Front Neurorobot. 2018 Sep 27;12:61. doi: 10.3389/fnbot.2018.00061. eCollection 2018.
5
Moving robotics competitions virtual: The case study of RoboCupJunior Soccer Simulation (SoccerSim).将机器人竞赛转移至虚拟环境:RoboCupJunior足球模拟赛(SoccerSim)的案例研究
Front Robot AI. 2022 Aug 15;9:915322. doi: 10.3389/frobt.2022.915322. eCollection 2022.
6
Multi-Agent Decision-Making Modes in Uncertain Interactive Traffic Scenarios via Graph Convolution-Based Deep Reinforcement Learning.基于图卷积的深度强化学习在不确定交互式交通场景中的多智能体决策模式。
Sensors (Basel). 2022 Jun 17;22(12):4586. doi: 10.3390/s22124586.
7
Deep Reinforcement Learning for Indoor Mobile Robot Path Planning.深度强化学习在室内移动机器人路径规划中的应用。
Sensors (Basel). 2020 Sep 25;20(19):5493. doi: 10.3390/s20195493.
8
Deep Reinforcement Learning on Autonomous Driving Policy With Auxiliary Critic Network.基于辅助评论家网络的自动驾驶策略深度强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3680-3690. doi: 10.1109/TNNLS.2021.3116063. Epub 2023 Jul 6.
9
Velocity range-based reward shaping technique for effective map-less navigation with LiDAR sensor and deep reinforcement learning.基于速度范围的奖励塑造技术,用于通过激光雷达传感器和深度强化学习实现有效的无地图导航。
Front Neurorobot. 2023 Sep 6;17:1210442. doi: 10.3389/fnbot.2023.1210442. eCollection 2023.
10
Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems.基于专家模仿的离散时间系统逆强化Q学习
IEEE Trans Neural Netw Learn Syst. 2023 May;34(5):2386-2399. doi: 10.1109/TNNLS.2021.3106635. Epub 2023 May 2.

引用本文的文献

1
A general framework for optimising cost-effectiveness of pandemic response under partial intervention measures.一种优化部分干预措施下大流行应对成本效益的通用框架。
Sci Rep. 2022 Nov 14;12(1):19482. doi: 10.1038/s41598-022-23668-x.

本文引用的文献

1
Artificial Development by Reinforcement Learning Can Benefit From Multiple Motivations.通过强化学习进行的人工开发可以从多种动机中受益。
Front Robot AI. 2019 Feb 14;6:6. doi: 10.3389/frobt.2019.00006. eCollection 2019.
2
Quantifying Long-Range Interactions and Coherent Structure in Multi-Agent Dynamics.量化多智能体动力学中的长程相互作用和相干结构。
Artif Life. 2017 Winter;23(1):34-57. doi: 10.1162/ARTL_a_00221. Epub 2017 Jan 31.
3
Supervised Learning for Dynamical System Learning.用于动态系统学习的监督学习。
Adv Neural Inf Process Syst. 2015;28:1954-1962.
4
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.