Suppr超能文献

迈向机器人环境中深度交互式强化学习的广泛持久咨询方法。

Towards a Broad-Persistent Advising Approach for Deep Interactive Reinforcement Learning in Robotic Environments.

机构信息

School of Information Technology, Deakin University, Geelong 3220, Australia.

School of Computer Science and Engineering, University of New South Wales, Sydney 2052, Australia.

出版信息

Sensors (Basel). 2023 Mar 1;23(5):2681. doi: 10.3390/s23052681.

Abstract

Deep Reinforcement Learning (DeepRL) methods have been widely used in robotics to learn about the environment and acquire behaviours autonomously. Deep Interactive Reinforcement 2 Learning (DeepIRL) includes interactive feedback from an external trainer or expert giving advice to help learners choose actions to speed up the learning process. However, current research has been limited to interactions that offer actionable advice to only the current state of the agent. Additionally, the information is discarded by the agent after a single use, which causes a duplicate process at the same state for a revisit. In this paper, we present Broad-Persistent Advising (BPA), an approach that retains and reuses the processed information. It not only helps trainers give more general advice relevant to similar states instead of only the current state, but also allows the agent to speed up the learning process. We tested the proposed approach in two continuous robotic scenarios, namely a cart pole balancing task and a simulated robot navigation task. The results demonstrated that the agent's learning speed increased, as evidenced by the rising reward points of up to 37%, while maintaining the number of interactions required for the trainer, in comparison to the DeepIRL approach.

摘要

深度强化学习 (DeepRL) 方法已广泛应用于机器人学,以实现自主学习环境和行为。深度交互式强化学习 2 (DeepIRL) 包括来自外部训练师或专家的交互式反馈,提供建议以帮助学习者选择行动,从而加速学习过程。然而,目前的研究仅限于提供可操作建议的交互,这些建议仅针对代理的当前状态。此外,代理在单次使用后会丢弃信息,这导致在同一状态下重复该过程。在本文中,我们提出了广泛持久建议 (BPA),这是一种保留和重用处理信息的方法。它不仅帮助训练师提供更通用的建议,这些建议与相似状态相关,而不仅仅是当前状态,而且还允许代理加速学习过程。我们在两个连续的机器人场景中测试了所提出的方法,即推车杆平衡任务和模拟机器人导航任务。结果表明,代理的学习速度加快,奖励点数最高可提高 37%,同时与 DeepIRL 方法相比,保持了训练师所需的交互次数。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验