Computer Science Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt.
Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt.
PLoS One. 2021 Jun 10;16(6):e0252754. doi: 10.1371/journal.pone.0252754. eCollection 2021.
Deep Reinforcement Learning (DRL) enables agents to make decisions based on a well-designed reward function that suites a particular environment without any prior knowledge related to a given environment. The adaptation of hyperparameters has a great impact on the overall learning process and the learning processing times. Hyperparameters should be accurately estimated while training DRL algorithms, which is one of the key challenges that we attempt to address. This paper employs a swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), for optimizing the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm to achieve the optimum control strategy in an autonomous driving control problem. DDPG is capable of handling complex environments, which contain continuous spaces for actions. To evaluate the proposed algorithm, the Open Racing Car Simulator (TORCS), a realistic autonomous driving simulation environment, was chosen to its ease of design and implementation. Using TORCS, the DDPG agent with optimized hyperparameters was compared with a DDPG agent with reference hyperparameters. The experimental results showed that the DDPG's hyperparameters optimization leads to maximizing the total rewards, along with testing episodes and maintaining a stable driving policy.
深度强化学习 (DRL) 使代理能够根据精心设计的奖励函数做出决策,该函数适合特定环境,而无需与给定环境相关的任何先验知识。超参数的调整对整体学习过程和学习处理时间有很大影响。在训练 DRL 算法时,应该准确估计超参数,这是我们试图解决的关键挑战之一。本文采用基于群体的优化算法,即鲸鱼优化算法(WOA),来优化深度确定性策略梯度(DDPG)算法的超参数,以在自主驾驶控制问题中实现最佳控制策略。DDPG 能够处理包含动作连续空间的复杂环境。为了评估所提出的算法,选择了 Open Racing Car Simulator(TORCS),这是一个现实的自主驾驶模拟环境,因为它易于设计和实现。使用 TORCS,将具有优化超参数的 DDPG 代理与具有参考超参数的 DDPG 代理进行了比较。实验结果表明,DDPG 的超参数优化可实现最大化总奖励,同时测试回合并保持稳定的驾驶策略。