Qi Yuanhang, Hu Jintao, Wang Fujie, Huang Gewen
School of Computer Science, University of Electronic Science and Technology of China, Zhongshan Institute, Zhongshan 528402, China.
College of Excellent Engineers, Dongguan University of Technology, Dongguan 523820, China.
Biomimetics (Basel). 2025 Sep 4;10(9):591. doi: 10.3390/biomimetics10090591.
Unmanned aerial vehicles (UAVs) often face significant challenges in trajectory tracking within complex dynamic environments, where uncertainties, external disturbances, and nonlinear dynamics hinder accurate and stable control. To address this issue, a bio-inspired deep reinforcement learning (DRL) algorithm is proposed, integrating behavior cloning (BC) and long short-term memory (LSTM) networks. This method can achieve autonomous learning of high-precision control policy without establishing an accurate system dynamics model. Motivated by the memory and prediction functions of biological neural systems, an LSTM module is embedded into the policy network of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. This structure captures temporal state patterns more effectively, enhancing adaptability to trajectory variations and resilience to delays or disturbances. Compared to memoryless networks, the LSTM-based design better replicates biological time-series processing, improving tracking stability and accuracy. In addition, behavior cloning is employed to pre-train the DRL policy using expert demonstrations, mimicking the way animals learn from observation. This biomimetic plausible initialization accelerates convergence by reducing inefficient early-stage exploration. By combining offline imitation with online learning, the TD3-LSTM-BC framework balances expert guidance and adaptive optimization, analogous to innate and experience-based learning in nature. Simulation experimental results confirm the superior robustness and tracking accuracy of the proposed method, demonstrating its potential as a control solution for autonomous UAVs.
无人机(UAVs)在复杂动态环境中的轨迹跟踪常常面临重大挑战,其中不确定性、外部干扰和非线性动力学阻碍了精确且稳定的控制。为解决这一问题,提出了一种受生物启发的深度强化学习(DRL)算法,该算法集成了行为克隆(BC)和长短期记忆(LSTM)网络。此方法无需建立精确的系统动力学模型即可实现高精度控制策略的自主学习。受生物神经系统的记忆和预测功能启发,将一个LSTM模块嵌入到双延迟深度确定性策略梯度(TD3)算法的策略网络中。这种结构能更有效地捕捉时间状态模式,增强对轨迹变化的适应性以及对延迟或干扰的恢复能力。与无记忆网络相比,基于LSTM的设计能更好地复制生物时间序列处理过程,提高跟踪稳定性和准确性。此外,采用行为克隆利用专家示范对DRL策略进行预训练,模仿动物从观察中学习的方式。这种仿生似是而非的初始化通过减少低效的早期探索来加速收敛。通过将离线模仿与在线学习相结合,TD3-LSTM-BC框架平衡了专家指导和自适应优化,类似于自然界中基于先天和经验的学习。仿真实验结果证实了所提方法具有卓越的鲁棒性和跟踪准确性,表明其作为无人机自主控制解决方案的潜力。