Suppr超能文献

在强化学习设置中使用TextWorld对用户定义的任务进行分解。

Decomposing user-defined tasks in a reinforcement learning setup using TextWorld.

作者信息

Petsanis Thanos, Keroglou Christoforos, Ch Kapoutsis Athanasios, Kosmatopoulos Elias B, Sirakoulis Georgios Ch

机构信息

School of Engineering, Department of Electrical and Computer Engineering, Democritus University of Thrace (DUTH), Xanthi, Greece.

The Centre for Research and Technology, Information Technologies Institute, Thessaloniki, Greece.

出版信息

Front Robot AI. 2023 Dec 22;10:1280578. doi: 10.3389/frobt.2023.1280578. eCollection 2023.

Abstract

The current paper proposes a hierarchical reinforcement learning (HRL) method to decompose a complex task into simpler sub-tasks and leverage those to improve the training of an autonomous agent in a simulated environment. For practical reasons (i.e., illustrating purposes, easy implementation, user-friendly interface, and useful functionalities), we employ two Python frameworks called TextWorld and MiniGrid. MiniGrid functions as a 2D simulated representation of the real environment, while TextWorld functions as a high-level abstraction of this simulated environment. Training on this abstraction disentangles manipulation from navigation actions and allows us to design a dense reward function instead of a sparse reward function for the lower-level environment, which, as we show, improves the performance of training. Formal methods are utilized throughout the paper to establish that our algorithm is not prevented from deriving solutions.

摘要

本文提出了一种分层强化学习(HRL)方法,将复杂任务分解为更简单的子任务,并利用这些子任务来改进模拟环境中自主智能体的训练。出于实际原因(即用于说明目的、易于实现、用户友好界面和有用功能),我们使用了两个名为TextWorld和MiniGrid的Python框架。MiniGrid作为真实环境的二维模拟表示,而TextWorld作为此模拟环境的高级抽象。在此抽象上进行训练将操作与导航动作分离,并允许我们为低级环境设计密集奖励函数而不是稀疏奖励函数,正如我们所展示的,这提高了训练性能。本文通篇使用形式化方法来证明我们的算法不会阻碍得出解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6af/10766815/e60aaad81f7f/frobt-10-1280578-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验