基于相对熵调节策略的稳健演员-评论家算法

Robust Actor-Critic With Relative Entropy Regulating Actor.

作者信息

Cheng Yuhu, Huang Longyang, Chen C L Philip, Wang Xuesong

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):9054-9063. doi: 10.1109/TNNLS.2022.3155483. Epub 2023 Oct 27.

DOI:10.1109/TNNLS.2022.3155483

Abstract

The accurate estimation of Q-function and the enhancement of agent's exploration ability have always been challenges of off-policy actor-critic algorithms. To address the two concerns, a novel robust actor-critic (RAC) is developed in this article. We first derive a robust policy improvement mechanism (RPIM) by using the local optimal policy about the current estimated Q-function to guide policy improvement. By constraining the relative entropy between the new policy and the previous one in policy improvement, the proposed RPIM can enhance the stability of the policy update process. The theoretical analysis shows that the incentive to increase the policy entropy is endowed when the policy is updated, which is conducive to enhancing the exploration ability of agents. Then, RAC is developed by applying the proposed RPIM to regulate the actor improvement process. The developed RAC is proven to be convergent. Finally, the proposed RAC is evaluated on some continuous-action control tasks in the MuJoCo platform and the experimental results show that RAC outperforms several state-of-the-art reinforcement learning algorithms.

摘要

准确估计Q函数以及增强智能体的探索能力一直是离策略演员-评论家算法面临的挑战。为了解决这两个问题，本文提出了一种新颖的鲁棒演员-评论家（RAC）算法。我们首先通过使用关于当前估计Q函数的局部最优策略来推导鲁棒策略改进机制（RPIM），以指导策略改进。通过在策略改进过程中约束新策略与前一个策略之间的相对熵，所提出的RPIM可以提高策略更新过程的稳定性。理论分析表明，在更新策略时赋予了增加策略熵的激励，这有利于增强智能体的探索能力。然后，通过应用所提出的RPIM来调节演员改进过程，开发了RAC算法。所开发的RAC算法被证明是收敛的。最后，在MuJoCo平台上的一些连续动作控制任务上对所提出的RAC算法进行了评估，实验结果表明RAC算法优于几种先进的强化学习算法。

相似文献

Robust Actor-Critic With Relative Entropy Regulating Actor.基于相对熵调节策略的稳健演员-评论家算法

IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):9054-9063. doi: 10.1109/TNNLS.2022.3155483. Epub 2023 Oct 27.

Meta attention for Off-Policy Actor-Critic.用于离策略演员-评论家的元注意力机制

Neural Netw. 2023 Jun;163:86-96. doi: 10.1016/j.neunet.2023.03.024. Epub 2023 Mar 28.

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors.分布软演员-评论家：用于解决价值估计误差的离策略强化学习

IEEE Trans Neural Netw Learn Syst. 2022 Nov;33(11):6584-6598. doi: 10.1109/TNNLS.2021.3082568. Epub 2022 Oct 27.

Relative Entropy Regularized Sample-Efficient Reinforcement Learning With Continuous Actions.具有连续动作的相对熵正则化样本高效强化学习

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):475-485. doi: 10.1109/TNNLS.2023.3329513. Epub 2025 Jan 7.

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples With On-Policy Experiences.改进的软演员-评论家算法：将优先离策略样本与在线策略经验相结合。

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3121-3129. doi: 10.1109/TNNLS.2022.3174051. Epub 2024 Feb 29.

Improving Exploration in Actor-Critic With Weakly Pessimistic Value Estimation and Optimistic Policy Optimization.通过弱悲观值估计和乐观策略优化改进演员-评论家算法中的探索

IEEE Trans Neural Netw Learn Syst. 2024 Jul;35(7):8783-8796. doi: 10.1109/TNNLS.2022.3215596. Epub 2024 Jul 8.

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms.用于优化深度确定性策略梯度算法的网络架构。

Comput Intell Neurosci. 2022 Nov 18;2022:1117781. doi: 10.1155/2022/1117781. eCollection 2022.

Realistic Actor-Critic: A framework for balance between value overestimation and underestimation.现实演员-评论家：一个用于平衡价值高估与低估的框架。

Front Neurorobot. 2023 Jan 9;16:1081242. doi: 10.3389/fnbot.2022.1081242. eCollection 2022.

Stochastic Integrated Actor-Critic for Deep Reinforcement Learning.用于深度强化学习的随机集成演员-评论家算法

IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6654-6666. doi: 10.1109/TNNLS.2022.3212273. Epub 2024 May 2.

Boosting On-Policy Actor-Critic With Shallow Updates in Critic.通过在评论家网络中进行浅层更新来增强策略上的演员-评论家算法

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5644-5653. doi: 10.1109/TNNLS.2024.3378913. Epub 2025 Feb 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于相对熵调节策略的稳健演员-评论家算法

Robust Actor-Critic With Relative Entropy Regulating Actor.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献