基于元学习的驱动轮转向车辆在执行器故障情况下的容错控制

Metalearning-Based Fault-Tolerant Control for Skid Steering Vehicles under Actuator Fault Conditions.

作者信息

Dai Huatong, Chen Pengzhan, Yang Hui

机构信息

School of Electrical Engineering and Automation, East China Jiaotong University, Nanchang 330013, China.

出版信息

Sensors (Basel). 2022 Jan 22;22(3):845. doi: 10.3390/s22030845.

DOI:10.3390/s22030845

PMID:35161591

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8840082/

Abstract

Using reinforcement learning (RL) for torque distribution of skid steering vehicles has attracted increasing attention recently. Various RL-based torque distribution methods have been proposed to deal with this classical vehicle control problem, achieving a better performance than traditional control methods. However, most RL-based methods focus only on improving the performance of skid steering vehicles, while actuator faults that may lead to unsafe conditions or catastrophic events are frequently omitted in existing control schemes. This study proposes a meta-RL-based fault-tolerant control (FTC) method to improve the tracking performance of vehicles in the case of actuator faults. Based on meta deep deterministic policy gradient (meta-DDPG), the proposed FTC method has a representative gradient-based metalearning algorithm workflow, which includes an offline stage and an online stage. In the offline stage, an experience replay buffer with various actuator faults is constructed to provide data for training the metatraining model; then, the metatrained model is used to develop an online meta-RL update method to quickly adapt its control policy to actuator fault conditions. Simulations of four scenarios demonstrate that the proposed FTC method can achieve a high performance and adapt to actuator fault conditions stably.

摘要

近年来，使用强化学习（RL）进行滑移转向车辆的扭矩分配受到了越来越多的关注。人们提出了各种基于RL的扭矩分配方法来处理这个经典的车辆控制问题，其性能优于传统控制方法。然而，大多数基于RL的方法只专注于提高滑移转向车辆的性能，而在现有控制方案中，可能导致不安全状况或灾难性事件的执行器故障常常被忽略。本研究提出了一种基于元强化学习的容错控制（FTC）方法，以在执行器出现故障的情况下提高车辆的跟踪性能。基于元深度确定性策略梯度（meta-DDPG），所提出的FTC方法具有代表性的基于梯度的元学习算法工作流程，包括离线阶段和在线阶段。在离线阶段，构建一个包含各种执行器故障的经验回放缓冲区，为训练元训练模型提供数据；然后，使用元训练模型开发一种在线元强化学习更新方法，以使其控制策略快速适应执行器故障情况。四个场景的仿真表明，所提出的FTC方法能够实现高性能，并能稳定地适应执行器故障情况。