Zhang Yilin, Sun Huimin, Sun Honglin, Huang Yuan, Hashimoto Kenji
Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan.
Biomimetics (Basel). 2024 Jun 8;9(6):346. doi: 10.3390/biomimetics9060346.
As technology rapidly evolves, the application of bipedal robots in various environments has widely expanded. These robots, compared to their wheeled counterparts, exhibit a greater degree of freedom and a higher complexity in control, making the challenge of maintaining balance and stability under changing wind speeds particularly intricate. Overcoming this challenge is critical as it enables bipedal robots to sustain more stable gaits during outdoor tasks, thereby increasing safety and enhancing operational efficiency in outdoor settings. To transcend the constraints of existing methodologies, this research introduces an adaptive bio-inspired exploration framework for bipedal robots facing wind disturbances, which is based on the Deep Deterministic Policy Gradient (DDPG) approach. This framework allows the robots to perceive their bodily states through wind force inputs and adaptively modify their exploration coefficients. Additionally, to address the convergence challenges posed by sparse rewards, this study incorporates Hindsight Experience Replay (HER) and a reward-reshaping strategy to provide safer and more effective training guidance for the agents. Simulation outcomes reveal that robots utilizing this advanced method can more swiftly explore behaviors that contribute to stability in complex conditions, and demonstrate improvements in training speed and walking distance over traditional DDPG algorithms.
随着技术的迅速发展,双足机器人在各种环境中的应用得到了广泛扩展。与轮式机器人相比,这些机器人具有更大的自由度和更高的控制复杂性,使得在风速变化时保持平衡和稳定性的挑战尤为复杂。克服这一挑战至关重要,因为它能使双足机器人在户外任务中保持更稳定的步态,从而提高户外环境中的安全性并提升运行效率。为了突破现有方法的限制,本研究针对面临风干扰的双足机器人引入了一种基于深度确定性策略梯度(DDPG)方法的自适应生物启发式探索框架。该框架允许机器人通过风力输入感知自身身体状态,并自适应地调整其探索系数。此外,为了解决稀疏奖励带来的收敛挑战,本研究纳入了事后经验回放(HER)和奖励重塑策略,为智能体提供更安全、更有效的训练指导。仿真结果表明,使用这种先进方法的机器人能够在复杂条件下更迅速地探索有助于稳定的行为,并且在训练速度和行走距离方面比传统的DDPG算法有改进。