Xiong Tongzhao, Liu Zhaorong, Wang Yufei, Ong Chong Jin, Zhu Lailai
Department of Mechanical Engineering, National University of Singapore, Singapore, 117575, Singapore.
Department of Physics, University of Science and Technology of China, Hefei, Anhui, 230026, People's Republic of China.
Nat Commun. 2025 Jul 1;16(1):5441. doi: 10.1038/s41467-025-60646-z.
Microorganisms have evolved diverse strategies to propel themselves in viscous fluids, navigate complex environments, and exhibit taxis in response to stimuli. This has inspired the development of miniature robots, where artificial intelligence (AI) is playing an increasingly important role. Can AI endow these synthetic systems with intelligence akin to that honed through natural evolution? Here, we demonstrate, in silico, chemotactic navigation in a multi-link robotic model using two-level hierarchical reinforcement learning (RL). The lower-level RL allows the model-configured as a chain or ring topology-to acquire topology-adapted swimming gaits: wave propagation characteristic of flagella or body oscillation akin to an amoebae. Such chain and ring swimmers, further enabled by the higher-level RL, accomplish chemotactic navigation in prototypical biologically relevant scenarios that feature conflicting chemoattractants, pursuing a swimming bacterial mimic, steering in vortical flows, and squeezing through tight constrictions. Additionally, we achieve reset-free RL under partial observability, where simulated robots rely solely on local scalar observations rather than global or vectorial data. This advancement illuminates potential solutions for overcoming persistent challenges of manual resets and partial observability in real-world microrobotic RL.
微生物已经进化出多种策略,以便在粘性流体中推动自身前进、在复杂环境中导航并对刺激做出趋性反应。这激发了微型机器人的发展,其中人工智能(AI)正发挥着越来越重要的作用。人工智能能否赋予这些合成系统类似于通过自然进化磨练出来的智能?在此,我们在计算机模拟中展示了使用两级分层强化学习(RL)在多连杆机器人模型中的趋化导航。较低级别的强化学习使配置为链状或环状拓扑结构的模型能够获得适应拓扑结构的游泳步态:类似于鞭毛的波传播特性或类似于变形虫的身体振荡。这种链状和环状游泳者在更高级别的强化学习的进一步支持下,在具有相互冲突的化学引诱剂的典型生物相关场景中完成趋化导航,追逐模仿游泳细菌的目标,在涡流中转向,并挤过狭窄通道。此外,我们在部分可观测性下实现了无重置强化学习,即模拟机器人仅依靠局部标量观测而不是全局或矢量数据。这一进展为克服现实世界微型机器人强化学习中手动重置和部分可观测性的持续挑战提供了潜在解决方案。