Suppr超能文献

Anti-Martingale Proximal Policy Optimization.

作者信息

Gu Yang, Cheng Yuhu, Yu Kun, Wang Xuesong

出版信息

IEEE Trans Cybern. 2023 Oct;53(10):6421-6432. doi: 10.1109/TCYB.2022.3170355. Epub 2023 Sep 15.

Abstract

Since the sample data after one exploration process can only be used to update network parameters once in on-policy deep reinforcement learning (DRL), a high sample efficiency is necessary to accelerate the training process of on-policy DRL. In the proposed method, a submartingale criterion is proposed on the basis of the equivalence relationship between the optimal policy and martingale, and then an advanced value iteration (AVI) method is proposed to conduct value iteration with a high accuracy. Based on this foundation, an anti-martingale (AM) reinforcement learning framework is established to efficiently select the sample data that is conducive to policy optimization. In succession, an AM proximal policy optimization (AMPPO) method, which combines the AM framework with proximal policy optimization (PPO), is proposed to reasonably accelerate the updating process of state value that satisfies the submartingale criterion. Experimental results on the Mujoco platform show that AMPPO can achieve better performance than several state-of-the-art comparative DRL methods.

摘要

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验