Marjaninejad Ali, Urbina-Meléndez Darío, Cohn Brian A, Valero-Cuevas Francisco J
Department of Biomedical, University of Southern California, Los Angeles, CA, USA.
Department of Electrical (Systems), University of Southern California, Los Angeles, CA, USA.
Nat Mach Intell. 2019 Mar;1(3):144-154. doi: 10.1038/s42256-019-0029-0. Epub 2019 Mar 11.
Robots will become ubiquitously useful only when they can use few attempts to teach themselves to perform different tasks, even with complex bodies and in dynamical environments. Vertebrates, in fact, use sparse trial-and-error to learn multiple tasks despite their intricate tendon-driven anatomies-which are particularly hard to control because they are simultaneously nonlinear, under-determined, and over-determined. We demonstrate-for the first time in simulation and hardware-how a model-free, open-loop approach allows few-shot autonomous learning to produce effective movements in a 3-tendon 2-joint limb. We use a short period of motor babbling (to create an initial inverse map) followed by building functional habits by reinforcing high-reward behavior and refinements of the inverse map in a movement's neighborhood. This biologically-plausible algorithm, which we call G2P (General-to-Particular), can potentially enable quick, robust and versatile adaptation in robots as well as shed light on the foundations of the enviable functional versatility of organisms.
只有当机器人能够通过很少的尝试自学来执行不同任务时,它们才会变得无处不在地有用,即使是在具有复杂身体和动态环境的情况下。事实上,脊椎动物尽管其肌腱驱动的解剖结构复杂(由于它们同时具有非线性、欠定和超定的特点,因此特别难以控制),但仍能通过稀疏的试错法来学习多种任务。我们首次在模拟和硬件中展示了一种无模型的开环方法如何允许少样本自主学习在一个三肌腱双关节肢体中产生有效的运动。我们使用一段短暂的运动咿呀学语(以创建初始逆映射),然后通过强化高奖励行为和在运动邻域中对逆映射进行细化来建立功能习惯。这种具有生物学合理性的算法,我们称之为G2P(从一般到特殊),有可能使机器人实现快速、稳健和通用的适应性,同时也能揭示生物体令人羡慕的功能多样性的基础。