Gupta Agrim, Savarese Silvio, Ganguli Surya, Fei-Fei Li
Department of Computer Science, Stanford University, Stanford, CA, USA.
Department of Applied Physics, Stanford University, Stanford, CA, USA.
Nat Commun. 2021 Oct 6;12(1):5721. doi: 10.1038/s41467-021-25874-z.
The intertwined processes of learning and evolution in complex environmental niches have resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal intelligence are deeply embodied in these evolved morphologies. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, because performing large-scale in silico experiments on evolution and learning is challenging. Here, we introduce Deep Evolutionary Reinforcement Learning (DERL): a computational framework which can evolve diverse agent morphologies to learn challenging locomotion and manipulation tasks in complex environments. Leveraging DERL we demonstrate several relations between environmental complexity, morphological intelligence and the learnability of control. First, environmental complexity fosters the evolution of morphological intelligence as quantified by the ability of a morphology to facilitate the learning of novel tasks. Second, we demonstrate a morphological Baldwin effect i.e., in our simulations evolution rapidly selects morphologies that learn faster, thereby enabling behaviors learned late in the lifetime of early ancestors to be expressed early in the descendants lifetime. Third, we suggest a mechanistic basis for the above relationships through the evolution of morphologies that are more physically stable and energy efficient, and can therefore facilitate learning and control.
在复杂的生态位中,学习与进化相互交织的过程造就了形态形式的显著多样性。此外,动物智能的许多方面都深深体现在这些进化而来的形态中。然而,环境复杂性、进化形态与智能控制可学习性之间关系的原理仍然难以捉摸,因为在计算机上对进化和学习进行大规模实验具有挑战性。在此,我们引入深度进化强化学习(DERL):一个计算框架,它可以进化出多样的智能体形态,以便在复杂环境中学习具有挑战性的运动和操纵任务。利用DERL,我们展示了环境复杂性、形态智能与控制可学习性之间的几种关系。首先,环境复杂性促进了形态智能的进化,形态智能通过形态促进新任务学习的能力来量化。其次,我们展示了一种形态鲍德温效应,即在我们的模拟中,进化迅速选择学习速度更快的形态,从而使早期祖先在生命后期学到的行为能够在后代生命早期就得以表达。第三,我们通过进化出更具物理稳定性和能源效率、因此能够促进学习和控制的形态,为上述关系提出了一个机制基础。