Mon-Williams Ruaridh, Li Gen, Long Ran, Du Wenqian, Lucas Christopher G
University of Edinburgh, Edinburgh, UK.
Massachusetts Institute of Technology, Boston, MA USA.
Nat Mach Intell. 2025;7(4):592-601. doi: 10.1038/s42256-025-01005-x. Epub 2025 Mar 19.
Completing complex tasks in unpredictable settings challenges robotic systems, requiring a step change in machine intelligence. Sensorimotor abilities are considered integral to human intelligence. Thus, biologically inspired machine intelligence might usefully combine artificial intelligence with robotic sensorimotor capabilities. Here we report an embodied large-language-model-enabled robot (ELLMER) framework, utilizing GPT-4 and a retrieval-augmented generation infrastructure, to enable robots to complete long-horizon tasks in unpredictable settings. The method extracts contextually relevant examples from a knowledge base, producing action plans that incorporate force and visual feedback and enabling adaptation to changing conditions. We tested ELLMER on a robot tasked with coffee making and plate decoration; these tasks consist of a sequence of sub-tasks from drawer opening to pouring, each benefiting from distinct feedback types and methods. We show that the ELLMER framework allows the robot to complete the tasks. This demonstration marks progress towards scalable, efficient and 'intelligent robots' able to complete complex tasks in uncertain environments.
在不可预测的环境中完成复杂任务对机器人系统提出了挑战,这需要机器智能实现跨越式发展。感觉运动能力被认为是人类智能不可或缺的一部分。因此,受生物启发的机器智能可能会有效地将人工智能与机器人的感觉运动能力结合起来。在这里,我们报告了一个启用了具身大语言模型的机器人(ELLMER)框架,该框架利用GPT-4和检索增强生成基础设施,使机器人能够在不可预测的环境中完成长期任务。该方法从知识库中提取上下文相关的示例,生成包含力和视觉反馈的行动计划,并能够适应不断变化的条件。我们在一个负责煮咖啡和摆盘装饰的机器人上测试了ELLMER;这些任务由一系列子任务组成,从打开抽屉到倒咖啡,每个子任务都受益于不同的反馈类型和方法。我们表明,ELLMER框架使机器人能够完成任务。这一演示标志着朝着能够在不确定环境中完成复杂任务的可扩展、高效和“智能机器人”迈出了进展。