机器人物质中紧急行为的持续学习。

Continuous learning of emergent behavior in robotic matter.

机构信息

Designer Matter Department, AMOLF, 1098 XG Amsterdam, The Netherlands.

Designer Matter Department, AMOLF, 1098 XG Amsterdam, The Netherlands

出版信息

Proc Natl Acad Sci U S A. 2021 May 25;118(21). doi: 10.1073/pnas.2017015118.

DOI:10.1073/pnas.2017015118

PMID:33972408

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8166149/

Abstract

One of the main challenges in robotics is the development of systems that can adapt to their environment and achieve autonomous behavior. Current approaches typically aim to achieve this by increasing the complexity of the centralized controller by, e.g., direct modeling of their behavior, or implementing machine learning. In contrast, we simplify the controller using a decentralized and modular approach, with the aim of finding specific requirements needed for a robust and scalable learning strategy in robots. To achieve this, we conducted experiments and simulations on a specific robotic platform assembled from identical autonomous units that continuously sense their environment and react to it. By letting each unit adapt its behavior independently using a basic Monte Carlo scheme, the assembled system is able to learn and maintain optimal behavior in a dynamic environment as long as its memory is representative of the current environment, even when incurring damage. We show that the physical connection between the units is enough to achieve learning, and no additional communication or centralized information is required. As a result, such a distributed learning approach can be easily scaled to larger assemblies, blurring the boundaries between materials and robots, paving the way for a new class of modular "robotic matter" that can autonomously learn to thrive in dynamic or unfamiliar situations, for example, encountered by soft robots or self-assembled (micro)robots in various environments spanning from the medical realm to space explorations.

摘要

机器人学面临的主要挑战之一是开发能够适应环境并实现自主行为的系统。当前的方法通常旨在通过增加集中式控制器的复杂性来实现这一目标，例如直接对其行为进行建模，或实施机器学习。相比之下，我们使用分散式和模块化的方法简化了控制器，目的是为机器人找到稳健且可扩展的学习策略所需的具体要求。为了实现这一目标，我们在一个由相同的自主单元组装而成的特定机器人平台上进行了实验和模拟，这些单元可以持续感知环境并对其做出反应。通过让每个单元使用基本的蒙特卡罗方案独立地调整其行为，组装后的系统能够在动态环境中学习并保持最佳行为，只要其记忆具有代表性当前环境，即使发生损坏也是如此。我们表明，单元之间的物理连接足以实现学习，并且不需要额外的通信或集中信息。因此，这种分布式学习方法可以轻松扩展到更大的组件中，模糊了材料和机器人之间的界限，为新型模块化“机器人物质”铺平了道路，这种物质可以自主学习在动态或陌生环境中茁壮成长，例如软机器人或自组装（微）机器人在从医疗领域到太空探索的各种环境中遇到的情况。