Suppr超能文献

通过利用状态-动作等价性扩展Q学习

Scaling Up Q-Learning via Exploiting State-Action Equivalence.

作者信息

Lyu Yunlian, Côme Aymeric, Zhang Yijie, Talebi Mohammad Sadegh

机构信息

Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark.

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Xiyuan Ave., Chengdu 611731, China.

出版信息

Entropy (Basel). 2023 Mar 29;25(4):584. doi: 10.3390/e25040584.

Abstract

Recent success stories in reinforcement learning have demonstrated that leveraging structural properties of the underlying environment is key in devising viable methods capable of solving complex tasks. We study off-policy learning in discounted reinforcement learning, where some equivalence relation in the environment exists. We introduce a new model-free algorithm, called QL-ES (Q-learning with equivalence structure), which is a variant of (asynchronous) Q-learning tailored to exploit the equivalence structure in the MDP. We report a non-asymptotic PAC-type sample complexity bound for QL-ES, thereby establishing its sample efficiency. This bound also allows us to quantify the superiority of QL-ES over Q-learning analytically, which shows that the theoretical gain in some domains can be massive. We report extensive numerical experiments demonstrating that QL-ES converges significantly faster than (structure-oblivious) Q-learning empirically. They imply that the empirical performance gain obtained by exploiting the equivalence structure could be massive, even in simple domains. To the best of our knowledge, QL-ES is the first provably efficient model-free algorithm to exploit the equivalence structure in finite MDPs.

摘要

强化学习领域最近的成功案例表明,利用基础环境的结构特性是设计能够解决复杂任务的可行方法的关键。我们研究折扣强化学习中的离策略学习,其中环境中存在某种等价关系。我们引入了一种新的无模型算法,称为QL-ES(具有等价结构的Q学习),它是(异步)Q学习的一个变体,专门用于利用马尔可夫决策过程(MDP)中的等价结构。我们报告了QL-ES的非渐近PAC类型的样本复杂度边界,从而确定了其样本效率。这个边界还使我们能够从分析上量化QL-ES相对于Q学习的优越性,这表明在某些领域理论上的收益可能是巨大的。我们报告了广泛的数值实验,证明QL-ES在经验上比(忽略结构的)Q学习收敛得快得多。这些实验表明,即使在简单领域,利用等价结构获得的经验性能提升也可能是巨大的。据我们所知,QL-ES是第一个被证明有效的、利用有限MDP中等价结构的无模型算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/608a/10137898/9f30d1a241a5/entropy-25-00584-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验