Suppr超能文献

在粒子滤波蒙特卡洛规划中学习状态变量关系:一种移动机器人框架

Learning State-Variable Relationships in POMCP: A Framework for Mobile Robots.

作者信息

Zuccotto Maddalena, Piccinelli Marco, Castellini Alberto, Marchesini Enrico, Farinelli Alessandro

机构信息

Department of Computer Science, University of Verona, Verona, Italy.

出版信息

Front Robot AI. 2022 Jul 19;9:819107. doi: 10.3389/frobt.2022.819107. eCollection 2022.

Abstract

We address the problem of learning relationships on state variables in Partially Observable Markov Decision Processes (POMDPs) to improve planning performance. Specifically, we focus on Partially Observable Monte Carlo Planning (POMCP) and represent the acquired knowledge with a Markov Random Field (MRF). We propose, in particular, a method for learning these relationships on a robot as POMCP is used to plan future actions. Then, we present an algorithm that deals with cases in which the MRF is used on episodes having unlikely states with respect to the equality relationships represented by the MRF. Our approach acquires information from the agent's action outcomes to adapt online the MRF if a mismatch is detected between the MRF and the true state. We test this technique on two domains, rocksample, a standard rover exploration task, and a problem of velocity regulation in industrial mobile robotic platforms, showing that the MRF adaptation algorithm improves the planning performance with respect to the standard approach, which does not adapt the MRF online. Finally, a ROS-based architecture is proposed, which allows running the MRF learning, the MRF adaptation, and MRF usage in POMCP on real robotic platforms. In this case, we successfully tested the architecture on a Gazebo simulator of rocksample. A video of the experiments is available in the Supplementary Material, and the code of the ROS-based architecture is available online.

摘要

我们研究了在部分可观测马尔可夫决策过程(POMDP)中学习状态变量之间关系的问题,以提高规划性能。具体而言,我们聚焦于部分可观测蒙特卡洛规划(POMCP),并用马尔可夫随机场(MRF)来表示所获取的知识。特别是,我们提出了一种在机器人上学习这些关系的方法,因为POMCP用于规划未来行动。然后,我们提出了一种算法,用于处理在相对于MRF所表示的相等关系具有不太可能状态的情节上使用MRF的情况。如果检测到MRF与真实状态不匹配,我们的方法从智能体的行动结果中获取信息,以在线调整MRF。我们在两个领域测试了这项技术,一个是岩石采样(一种标准的漫游车探索任务),另一个是工业移动机器人平台中的速度调节问题,结果表明,相对于不在线调整MRF的标准方法,MRF自适应算法提高了规划性能。最后,我们提出了一种基于ROS的架构,该架构允许在真实机器人平台上运行MRF学习、MRF自适应以及在POMCP中使用MRF。在这种情况下,我们在岩石采样的Gazebo模拟器上成功测试了该架构。实验视频可在补充材料中获取,基于ROS的架构代码可在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b45/9343685/a97dc20ce510/frobt-09-819107-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验