基于投影模拟的强化学习在马尔可夫决策过程中的收敛性

On the convergence of projective-simulation-based reinforcement learning in Markov decision processes.

作者信息

Boyajian W L, Clausen J, Trenkwalder L M, Dunjko V, Briegel H J

机构信息

Institute for Theoretical Physics, University of Innsbruck, 6020 Innsbruck, Austria.

LIACS, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands.

出版信息

Quantum Mach Intell. 2020;2(2):13. doi: 10.1007/s42484-020-00023-9. Epub 2020 Nov 5.

DOI:10.1007/s42484-020-00023-9

PMID:33184611

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7644479/

Abstract

In recent years, the interest in leveraging quantum effects for enhancing machine learning tasks has significantly increased. Many algorithms speeding up supervised and unsupervised learning were established. The first framework in which ways to exploit quantum resources specifically for the broader context of reinforcement learning were found is projective simulation. Projective simulation presents an agent-based reinforcement learning approach designed in a manner which may support quantum walk-based speedups. Although classical variants of projective simulation have been benchmarked against common reinforcement learning algorithms, very few formal theoretical analyses have been provided for its performance in standard learning scenarios. In this paper, we provide a detailed formal discussion of the properties of this model. Specifically, we prove that one version of the projective simulation model, understood as a reinforcement learning approach, converges to optimal behavior in a large class of Markov decision processes. This proof shows that a physically inspired approach to reinforcement learning can guarantee to converge.

摘要

近年来，利用量子效应来增强机器学习任务的兴趣显著增加。许多加速监督学习和无监督学习的算法被建立起来。第一个专门针对强化学习更广泛背景探索量子资源的框架是投影模拟。投影模拟提出了一种基于智能体的强化学习方法，其设计方式可能支持基于量子行走的加速。尽管投影模拟的经典变体已与常见的强化学习算法进行了基准测试，但对于其在标准学习场景中的性能，很少有正式的理论分析。在本文中，我们对该模型的属性进行了详细的形式化讨论。具体而言，我们证明了投影模拟模型的一个版本，作为一种强化学习方法，在一大类马尔可夫决策过程中收敛到最优行为。这一证明表明，一种受物理启发的强化学习方法能够保证收敛。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4e7/7644479/ba6c6002733e/42484_2020_23_Fig1_HTML.jpg

相似文献

On the convergence of projective-simulation-based reinforcement learning in Markov decision processes.

Quantum Mach Intell. 2020;2(2):13. doi: 10.1007/s42484-020-00023-9. Epub 2020 Nov 5.

Projective simulation for artificial intelligence.

Sci Rep. 2012;2:400. doi: 10.1038/srep00400. Epub 2012 May 15.

A unified analysis of value-function-based reinforcement- learning algorithms.

Neural Comput. 1999 Nov 15;11(8):2017-59. doi: 10.1162/089976699300016070.

Projective simulation with generalization.

Sci Rep. 2017 Oct 31;7(1):14430. doi: 10.1038/s41598-017-14740-y.

Online learning of shaping rewards in reinforcement learning.

Neural Netw. 2010 May;23(4):541-50. doi: 10.1016/j.neunet.2010.01.001. Epub 2010 Jan 11.

Quantum-accessible reinforcement learning beyond strictly epochal environments.

Quantum Mach Intell. 2021;3(2):22. doi: 10.1007/s42484-021-00049-7. Epub 2021 Aug 2.

PaCAR: COVID-19 Pandemic Control Decision Making via Large-Scale Agent-Based Modeling and Deep Reinforcement Learning.

Med Decis Making. 2022 Nov;42(8):1064-1077. doi: 10.1177/0272989X221107902. Epub 2022 Jul 1.

The Convergence of a Cooperation Markov Decision Process System.

Entropy (Basel). 2020 Aug 30;22(9):955. doi: 10.3390/e22090955.

A delay-robust method for enhanced real-time reinforcement learning.

Neural Netw. 2025 Jan;181:106769. doi: 10.1016/j.neunet.2024.106769. Epub 2024 Oct 1.

Active learning machine learns to create new quantum experiments.

Proc Natl Acad Sci U S A. 2018 Feb 6;115(6):1221-1226. doi: 10.1073/pnas.1714936115. Epub 2018 Jan 18.

引用本文的文献

Learning how to find targets in the micro-world: the case of intermittent active Brownian particles.

Soft Matter. 2024 Feb 28;20(9):2008-2016. doi: 10.1039/d3sm01680c.

Skill Learning by Autonomous Robotic Playing Using Active Learning and Exploratory Behavior Composition.

Front Robot AI. 2020 Apr 3;7:42. doi: 10.3389/frobt.2020.00042. eCollection 2020.

本文引用的文献

Skill Learning by Autonomous Robotic Playing Using Active Learning and Exploratory Behavior Composition.

Front Robot AI. 2020 Apr 3;7:42. doi: 10.3389/frobt.2020.00042. eCollection 2020.

Machine learning & artificial intelligence in the quantum domain: a review of recent progress.

Rep Prog Phys. 2018 Jul;81(7):074001. doi: 10.1088/1361-6633/aab406. Epub 2018 Mar 5.

Active learning machine learns to create new quantum experiments.

Proc Natl Acad Sci U S A. 2018 Feb 6;115(6):1221-1226. doi: 10.1073/pnas.1714936115. Epub 2018 Jan 18.

Projective simulation with generalization.

Sci Rep. 2017 Oct 31;7(1):14430. doi: 10.1038/s41598-017-14740-y.

Quantum machine learning.

Nature. 2017 Sep 13;549(7671):195-202. doi: 10.1038/nature23474.

Quantum-Enhanced Machine Learning.

Phys Rev Lett. 2016 Sep 23;117(13):130501. doi: 10.1103/PhysRevLett.117.130501. Epub 2016 Sep 20.

On creative machines and the physical origins of freedom.

Sci Rep. 2012;2:522. doi: 10.1038/srep00522. Epub 2012 Jul 20.

Projective simulation for artificial intelligence.

Sci Rep. 2012;2:400. doi: 10.1038/srep00400. Epub 2012 May 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于投影模拟的强化学习在马尔可夫决策过程中的收敛性

On the convergence of projective-simulation-based reinforcement learning in Markov decision processes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献