基于蜣螂优化和优先经验回放机制的深度确定性策略梯度算法

Deep deterministic policy gradient algorithm based on dung beetle optimization and priority experience replay mechanism.

作者信息

Zhu Hengwei, Rong Chuiting, Liu Haorui

机构信息

College of Computer and Information Engineering, Dezhou University, Dezhou, 253023, China.

出版信息

Sci Rep. 2025 Apr 22;15(1):13978. doi: 10.1038/s41598-025-99213-3.

DOI:10.1038/s41598-025-99213-3

PMID:40263605

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12015412/

Abstract

Reinforcement learning algorithms that handle continuous action spaces have the problem of slow convergence and local optimality. Hence, we propose a deep deterministic policy gradient algorithm based on the dung beetle optimization algorithm (DBOP-DDPG) and priority experience replay mechanism. This method first adopts the simultaneous search policy of multiple populations by introducing the dung beetle optimizer (DBO), which can effectively keep the algorithm from falling into the local optimum solution and improve global optimization capability. Then, we design a criterion for determining the priority of sample data. The experience replay mechanism sampling is improved, and sample data in the experience replay mechanism are stored in three replay mechanisms based on importance for subsequent sampling training to then improve the algorithm's convergence speed. Finally, tests were conducted in three classic control environments of OpenAI Gym. The results showed that the improved method improved the convergence speed by at least 10% compared with the comparison algorithm, and the cumulative reward value was increased by up to 150.

摘要

处理连续动作空间的强化学习算法存在收敛速度慢和局部最优的问题。因此，我们提出了一种基于蜣螂优化算法（DBOP-DDPG）和优先经验回放机制的深度确定性策略梯度算法。该方法首先通过引入蜣螂优化器（DBO）采用多种群同步搜索策略，能有效防止算法陷入局部最优解，提高全局优化能力。然后，我们设计了一种确定样本数据优先级的准则。改进了经验回放机制采样，将经验回放机制中的样本数据按重要性存储在三种回放机制中，以便后续采样训练，从而提高算法的收敛速度。最后，在OpenAI Gym的三个经典控制环境中进行了测试。结果表明，与对比算法相比，改进后的方法收敛速度提高了至少10%，累积奖励值提高了多达150。