Google Brain, Tokyo, Japan.
Artif Life. 2019 Fall;25(4):352-365. doi: 10.1162/artl_a_00301. Epub 2019 Nov 7.
In many reinforcement learning tasks, the goal is to learn a policy to manipulate an agent, whose design is fixed, to maximize some notion of cumulative reward. The design of the agent's physical structure is rarely optimized for the task at hand. In this work, we explore the possibility of learning a version of the agent's design that is better suited for its task, jointly with the policy. We propose an alteration to the popular OpenAI Gym framework, where we parameterize parts of an environment, and allow an agent to jointly learn to modify these environment parameters along with its policy. We demonstrate that an agent can learn a better structure of its body that is not only better suited for the task, but also facilitates policy learning. Joint learning of policy and structure may even uncover design principles that are useful for assisted-design applications.
在许多强化学习任务中,目标是学习一个策略来操纵一个代理,其设计是固定的,以最大化某种累积奖励的概念。代理的物理结构的设计很少针对手头的任务进行优化。在这项工作中,我们探索了学习代理设计的一种版本的可能性,该版本与策略一起更好地适应其任务。我们对流行的 OpenAI Gym 框架进行了修改,在该框架中,我们参数化环境的一部分,并允许代理共同学习修改这些环境参数及其策略。我们证明,代理可以学习到更好的身体结构,不仅更适合任务,而且还可以促进策略学习。策略和结构的联合学习甚至可以揭示出对于辅助设计应用有用的设计原则。