风力干扰环境下基于持续强化学习的四旋翼轨迹跟踪控制器

Trajectory Tracking Controller for Quadrotor by Continual Reinforcement Learning in Wind-Disturbed Environment.

作者信息

Liu Yanhui, Hao Lina, Wang Shuopeng, Wang Xu

机构信息

School of Mechanical Engineering and Automation, Northeastern University, Shenyang 110819, China.

出版信息

Sensors (Basel). 2025 Aug 8;25(16):4895. doi: 10.3390/s25164895.

DOI:10.3390/s25164895

PMID:40871755

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12390258/

Abstract

The extensive deployment of quadrotors in complex environmental missions has revealed a critical challenge: degradation of trajectory tracking accuracy due to time-varying wind disturbances. Conventional model-based controllers struggle to adapt to nonlinear wind field dynamics, while data-driven approaches often suffer from catastrophic forgetting that compromises environmental adaptability. This paper proposes a reinforcement learning framework with continual adaptation capabilities to enhance robust tracking performance for quadrotors operating in dynamic wind fields. We develop a continual reinforcement learning framework integrating continual backpropagation algorithms with reinforcement learning. Initially, a foundation model is trained in wind-free conditions. When wind disturbance intensity undergoes gradual variations, a neuron utility assessment mechanism dynamically resets inefficient neurons to maintain network plasticity. Concurrently, a multi-objective reward function is designed to improve both training precision and efficiency. The Gazebo/PX4 simulation platform was utilized to validate the wind disturbance stepwise growth and stochastic variations. This approach demonstrated a reduction in the root mean square error of trajectory tracking when compared to the standard PPO algorithm. The proposed framework resolves the plasticity loss problem in deep reinforcement learning through structured neuron resetting, significantly enhancing the continual adaptation capabilities of quadrotors in dynamic wind fields.

摘要

四旋翼飞行器在复杂环境任务中的广泛部署揭示了一个关键挑战

由于时变风干扰导致轨迹跟踪精度下降。传统的基于模型的控制器难以适应非线性风场动态，而数据驱动方法往往会遭受灾难性遗忘问题，这会损害环境适应性。本文提出了一种具有持续适应能力的强化学习框架，以增强在动态风场中运行的四旋翼飞行器的鲁棒跟踪性能。我们开发了一种将持续反向传播算法与强化学习相结合的持续强化学习框架。最初，在无风中训练一个基础模型。当风干扰强度逐渐变化时，一种神经元效用评估机制会动态重置效率低下的神经元，以保持网络可塑性。同时，设计了一个多目标奖励函数来提高训练精度和效率。利用Gazebo/PX4仿真平台对风干扰的逐步增长和随机变化进行了验证。与标准的近端策略优化（PPO）算法相比，该方法在轨迹跟踪的均方根误差方面有所降低。所提出的框架通过结构化的神经元重置解决了深度强化学习中的可塑性损失问题，显著增强了四旋翼飞行器在动态风场中的持续适应能力。