Suppr超能文献

基于鲸鱼优化算法优化自动驾驶中深度强化学习的超参数。

Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm.

机构信息

Computer Science Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt.

Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt.

出版信息

PLoS One. 2021 Jun 10;16(6):e0252754. doi: 10.1371/journal.pone.0252754. eCollection 2021.

Abstract

Deep Reinforcement Learning (DRL) enables agents to make decisions based on a well-designed reward function that suites a particular environment without any prior knowledge related to a given environment. The adaptation of hyperparameters has a great impact on the overall learning process and the learning processing times. Hyperparameters should be accurately estimated while training DRL algorithms, which is one of the key challenges that we attempt to address. This paper employs a swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), for optimizing the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm to achieve the optimum control strategy in an autonomous driving control problem. DDPG is capable of handling complex environments, which contain continuous spaces for actions. To evaluate the proposed algorithm, the Open Racing Car Simulator (TORCS), a realistic autonomous driving simulation environment, was chosen to its ease of design and implementation. Using TORCS, the DDPG agent with optimized hyperparameters was compared with a DDPG agent with reference hyperparameters. The experimental results showed that the DDPG's hyperparameters optimization leads to maximizing the total rewards, along with testing episodes and maintaining a stable driving policy.

摘要

深度强化学习 (DRL) 使代理能够根据精心设计的奖励函数做出决策,该函数适合特定环境,而无需与给定环境相关的任何先验知识。超参数的调整对整体学习过程和学习处理时间有很大影响。在训练 DRL 算法时,应该准确估计超参数,这是我们试图解决的关键挑战之一。本文采用基于群体的优化算法,即鲸鱼优化算法(WOA),来优化深度确定性策略梯度(DDPG)算法的超参数,以在自主驾驶控制问题中实现最佳控制策略。DDPG 能够处理包含动作连续空间的复杂环境。为了评估所提出的算法,选择了 Open Racing Car Simulator(TORCS),这是一个现实的自主驾驶模拟环境,因为它易于设计和实现。使用 TORCS,将具有优化超参数的 DDPG 代理与具有参考超参数的 DDPG 代理进行了比较。实验结果表明,DDPG 的超参数优化可实现最大化总奖励,同时测试回合并保持稳定的驾驶策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52b4/8191943/0ceb98e790f6/pone.0252754.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验