离策略深度强化学习中的Z分数经验回放

Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning.

作者信息

Yang Yana, Xi Meng, Dai Huiao, Wen Jiabao, Yang Jiachen

机构信息

The School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China.

出版信息

Sensors (Basel). 2024 Dec 4;24(23):7746. doi: 10.3390/s24237746.

DOI:10.3390/s24237746

PMID:39686283

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11645091/

Abstract

Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences, making it easier to find global optimal solutions. Understanding how to utilize experiences is crucial for improving the efficiency of off-policy reinforcement learning algorithms. To address this problem, this paper proposes Z-Score Prioritized Experience Replay, which enhances the utilization of experiences and improves the performance and convergence speed of the algorithm. A series of ablation experiments demonstrate that the proposed method significantly improves the effectiveness of deep reinforcement learning algorithms.

摘要

强化学习作为一种无需预训练数据的机器学习方法，通过智能体与其环境之间的持续交互来寻求最优策略。它是解决序列决策问题的重要方法。通过将其与深度学习相结合，深度强化学习具备强大的感知和决策能力，并已广泛应用于各个领域以解决复杂的决策问题。离策略强化学习通过存储和重放交互经验来分离探索和利用，从而更容易找到全局最优解。理解如何利用经验对于提高离策略强化学习算法的效率至关重要。为解决这一问题，本文提出了Z分数优先经验重放方法，该方法提高了经验利用率，提升了算法性能和收敛速度。一系列消融实验表明，所提方法显著提高了深度强化学习算法的有效性。