Sumiea Ebrahim Hamid, Abdulkadir Said Jadid, Alhussian Hitham Seddig, Al-Selwi Safwan Mahmood, Alqushaibi Alawi, Ragab Mohammed Gamal, Fati Suliman Mohamed
Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar, 32610, Perak, Malaysia.
Center for Research in Data Science (CeRDaS), Universiti Teknologi PETRONAS, Seri Iskandar, 32610, Perak, Malaysia.
Heliyon. 2024 May 7;10(9):e30697. doi: 10.1016/j.heliyon.2024.e30697. eCollection 2024 May 15.
Deep Reinforcement Learning (DRL) has gained significant adoption in diverse fields and applications, mainly due to its proficiency in resolving complicated decision-making problems in spaces with high-dimensional states and actions. Deep Deterministic Policy Gradient (DDPG) is a well-known DRL algorithm that adopts an actor-critic approach, synthesizing the advantages of value-based and policy-based reinforcement learning methods. The aim of this study is to provide a thorough examination of the latest developments, patterns, obstacles, and potential opportunities related to DDPG. A systematic search was conducted using relevant academic databases (Scopus, Web of Science, and ScienceDirect) to identify 85 relevant studies published in the last five years (2018-2023). We provide a comprehensive overview of the key concepts and components of DDPG, including its formulation, implementation, and training. Then, we highlight the various applications and domains of DDPG, including Autonomous Driving, Unmanned Aerial Vehicles, Resource Allocation, Communications and the Internet of Things, Robotics, and Finance. Additionally, we provide an in-depth comparison of DDPG with other DRL algorithms and traditional RL methods, highlighting its strengths and weaknesses. We believe that this review will be an essential resource for researchers, offering them valuable insights into the methods and techniques utilized in the field of DRL and DDPG.
深度强化学习(DRL)已在多个领域和应用中得到广泛应用,主要是因为它在解决具有高维状态和动作空间的复杂决策问题方面表现出色。深度确定性策略梯度(DDPG)是一种著名的DRL算法,它采用了一种演员-评论家方法,综合了基于价值和基于策略的强化学习方法的优点。本研究的目的是对与DDPG相关的最新发展、模式、障碍和潜在机会进行全面审视。我们使用相关学术数据库(Scopus、Web of Science和ScienceDirect)进行了系统搜索,以识别过去五年(2018 - 2023年)发表的85项相关研究。我们全面概述了DDPG的关键概念和组成部分,包括其公式、实现和训练。然后,我们重点介绍了DDPG的各种应用和领域,包括自动驾驶、无人机、资源分配、通信与物联网、机器人技术和金融。此外,我们对DDPG与其他DRL算法和传统RL方法进行了深入比较,突出了其优缺点。我们相信,这篇综述将成为研究人员的重要资源,为他们提供有关DRL和DDPG领域所使用方法和技术的宝贵见解。