Suppr超能文献

一种使用大语言模型进行神经符号机器人动作规划的框架。

A framework for neurosymbolic robot action planning using large language models.

作者信息

Capitanelli Alessio, Mastrogiovanni Fulvio

机构信息

Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genoa, Italy.

出版信息

Front Neurorobot. 2024 Jun 4;18:1342786. doi: 10.3389/fnbot.2024.1342786. eCollection 2024.

Abstract

Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in engineered robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, highly dynamic, human-robot collaboration scenarios because of the poor performance in planning domains where action effects may not be immediate, or when frequent re-planning is needed due to changed circumstances in the robot workspace. The validity of plans in the long term, plan length, and planning time could hinder the robot's efficiency and negatively affect the overall human-robot interaction's fluency. We present a framework, which we refer to as Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs' response time linearly scales with the combined length of the input and the output, instead of super-linearly as in the case of symbolic task planners, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, and to make each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. In the past year, significant efforts have been devoted by the research community to evaluate the overall cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to providing an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities in other metrics, specifically those related to their short- and mid-term generative capabilities, which are used to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.

摘要

符号任务规划是一种广泛应用的方法,用于实现机器人的自主性,因为它易于理解且便于在工程化的机器人架构中部署。然而,在现实世界中高度动态的人机协作场景中,符号任务规划技术难以扩展,这是因为在规划领域中,当动作效果可能不是即时的,或者由于机器人工作空间中的情况变化需要频繁重新规划时,其性能较差。长期计划的有效性、计划长度和规划时间可能会阻碍机器人的效率,并对整个人机交互的流畅性产生负面影响。我们提出了一个名为“照烧”(Teriyaki)的框架,专门旨在弥合符号任务规划与机器学习方法之间的差距。其基本原理是将大语言模型(LLMs),即GPT-3,训练成一个与规划领域定义语言(PDDL)兼容的神经符号任务规划器,然后利用其生成能力来克服符号任务规划器固有的一些局限性。潜在的好处包括:(i)随着规划领域复杂性的增加,具有更好的可扩展性,因为大语言模型的响应时间与输入和输出的组合长度呈线性比例,而不像符号任务规划器那样呈超线性比例;(ii)能够逐动作而不是端到端地合成计划,并在生成每个动作后立即使其可供执行,而不是等待整个计划可用,这反过来又实现了并发规划和执行。在过去一年中,研究界投入了大量精力来评估大语言模型的整体认知能力,取得了不同程度的成功。相反,通过“照烧”框架,我们旨在在特定规划领域提供与传统规划器相当的整体规划性能,同时在其他指标上利用大语言模型的能力,特别是那些与它们的短期和中期生成能力相关的指标,这些指标用于构建一个前瞻性预测规划模型。在选定领域的初步结果表明,我们的方法可以:(i)在一个包含1000个样本的测试数据集中解决95.5%的问题;(ii)生成的计划比传统符号规划器短13.5%;(iii)将计划可用平均总等待时间减少多达61.4%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e6e/11184123/fbca2671e1db/fnbot-18-1342786-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验