基于CPG-actor-critic方法的双足机器人强化学习

Reinforcement learning for a biped robot based on a CPG-actor-critic method.

作者信息

Nakamura Yutaka, Mori Takeshi, Sato Masa-aki, Ishii Shin

机构信息

Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.

出版信息

Neural Netw. 2007 Aug;20(6):723-35. doi: 10.1016/j.neunet.2007.01.002. Epub 2007 Feb 20.

DOI:10.1016/j.neunet.2007.01.002

PMID:17412559

Abstract

Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.

摘要

动物的节律性运动，如移动，被认为是由称为中枢模式发生器（CPG）的神经回路控制的，这些神经回路会产生振荡信号。受这种生物学机制的启发，人们对由CPG控制的节律性运动进行了研究。作为一种用于CPG控制器的自主学习框架，我们在本文中提出了一种强化学习方法，我们称之为“CPG-actor-critic”方法。该方法为行为体引入了一种新架构，其训练大致基于最近提出的一种随机策略梯度算法。我们将此方法应用于两足机器人控制的自动获取问题。计算机模拟表明，我们的方法可以成功地对CPG进行训练，从而使两足机器人不仅能够稳定行走，还能适应环境变化。

相似文献

Reinforcement learning for a biped robot based on a CPG-actor-critic method.

Neural Netw. 2007 Aug;20(6):723-35. doi: 10.1016/j.neunet.2007.01.002. Epub 2007 Feb 20.

Reinforcement learning of motor skills with policy gradients.

Neural Netw. 2008 May;21(4):682-97. doi: 10.1016/j.neunet.2008.02.003. Epub 2008 Apr 26.

Central pattern generators for locomotion control in animals and robots: a review.

Neural Netw. 2008 May;21(4):642-53. doi: 10.1016/j.neunet.2008.03.014. Epub 2008 May 14.

A reflexive neural network for dynamic biped walking control.

Neural Comput. 2006 May;18(5):1156-96. doi: 10.1162/089976606776241057.

Towards a general neural controller for quadrupedal locomotion.

Neural Netw. 2008 May;21(4):667-81. doi: 10.1016/j.neunet.2008.03.010. Epub 2008 Apr 27.

A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.

Biosystems. 2004 Nov;77(1-3):109-17. doi: 10.1016/j.biosystems.2004.05.001.

Neuromuscular control of the point to point and oscillatory movements of a sagittal arm with the actor-critic reinforcement learning method.

Comput Methods Biomech Biomed Engin. 2005 Apr;8(2):103-13. doi: 10.1080/10255840500167952.

A hybrid CPG-ZMP control system for stable walking of a simulated flexible spine humanoid robot.

Neural Netw. 2010 Apr;23(3):452-60. doi: 10.1016/j.neunet.2009.11.003. Epub 2009 Dec 3.

Smooth transition for CPG-based body shape control of a snake-like robot.

Bioinspir Biomim. 2014 Mar;9(1):016003. doi: 10.1088/1748-3182/9/1/016003. Epub 2013 Dec 16.

CPG-inspired workspace trajectory generation and adaptive locomotion control for quadruped robots.

IEEE Trans Syst Man Cybern B Cybern. 2011 Jun;41(3):867-80. doi: 10.1109/TSMCB.2010.2097589. Epub 2011 Jan 6.

引用本文的文献

Hierarchical reinforcement learning with central pattern generator for enabling a quadruped robot simulator to walk on a variety of terrains.

Sci Rep. 2025 Apr 2;15(1):11262. doi: 10.1038/s41598-025-94163-2.

Bio-inspired neural networks with central pattern generators for learning multi-skill locomotion.

Sci Rep. 2025 Mar 24;15(1):10165. doi: 10.1038/s41598-025-94408-0.

Robust and reusable self-organized locomotion of legged robots under adaptive physical and neural communications.

Front Neural Circuits. 2023 Mar 31;17:1111285. doi: 10.3389/fncir.2023.1111285. eCollection 2023.

Exploring Behaviors of Caterpillar-Like Soft Robots with a Central Pattern Generator-Based Controller and Reinforcement Learning.

Soft Robot. 2019 Oct;6(5):579-594. doi: 10.1089/soro.2018.0126. Epub 2019 May 20.

Optimal path for controlling pollution emissions in the Chinese electric power industry considering technological heterogeneity.

Environ Sci Pollut Res Int. 2019 Apr;26(11):11087-11099. doi: 10.1007/s11356-019-04526-2. Epub 2019 Feb 21.

Fast Dynamical Coupling Enhances Frequency Adaptation of Oscillators for Robotic Locomotion Control.

Front Neurorobot. 2017 Mar 21;11:14. doi: 10.3389/fnbot.2017.00014. eCollection 2017.

A novel approach to locomotion learning: Actor-Critic architecture using central pattern generators and dynamic motor primitives.

Front Neurorobot. 2014 Oct 2;8:23. doi: 10.3389/fnbot.2014.00023. eCollection 2014.

Humanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture.

Front Neurorobot. 2013 Apr 8;7:5. doi: 10.3389/fnbot.2013.00005. eCollection 2013.

A neurorobotic platform to test the influence of neuromodulatory signaling on anxious and curious behavior.

Front Neurorobot. 2013 Feb 5;7:1. doi: 10.3389/fnbot.2013.00001. eCollection 2013.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于CPG-actor-critic方法的双足机器人强化学习

Reinforcement learning for a biped robot based on a CPG-actor-critic method.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献