Suppr超能文献

分层离散追踪学习自动机:一种具有快速收敛性和ε最优性的新方案。

The Hierarchical Discrete Pursuit Learning Automaton: A Novel Scheme With Fast Convergence and Epsilon-Optimality.

作者信息

Omslandseter Rebekka Olsson, Jiao Lei, Zhang Xuan, Yazidi Anis, Oommen B John

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):8278-8292. doi: 10.1109/TNNLS.2022.3226538. Epub 2024 Jun 3.

Abstract

Since the early 1960s, the paradigm of learning automata (LA) has experienced abundant interest. Arguably, it has also served as the foundation for the phenomenon and field of reinforcement learning (RL). Over the decades, new concepts and fundamental principles have been introduced to increase the LA's speed and accuracy. These include using probability updating functions, discretizing the probability space, and using the "Pursuit" concept. Very recently, the concept of incorporating "structure" into the ordering of the LA's actions has improved both the speed and accuracy of the corresponding hierarchical machines, when the number of actions is large. This has led to the ϵ -optimal hierarchical continuous pursuit LA (HCPA). This article pioneers the inclusion of all the above-mentioned phenomena into a new single LA, leading to the novel hierarchical discretized pursuit LA (HDPA). Indeed, although the previously proposed HCPA is powerful, its speed has an impediment when any action probability is close to unity, because the updates of the components of the probability vector are correspondingly smaller when any action probability becomes closer to unity. We propose here, the novel HDPA, where we infuse the phenomenon of discretization into the action probability vector's updating functionality, and which is invoked recursively at every stage of the machine's hierarchical structure. This discretized functionality does not possess the same impediment, because discretization prohibits it. We demonstrate the HDPA's robustness and validity by formally proving the ϵ -optimality by utilizing the moderation property. We also invoke the submartingale characteristic at every level, to prove that the action probability of the optimal action converges to unity as time goes to infinity. Apart from the new machine being ϵ -optimal, the numerical results demonstrate that the number of iterations required for convergence is significantly reduced for the HDPA, when compared to the state-of-the-art HCPA scheme.

摘要

自20世纪60年代初以来,学习自动机(LA)范式一直备受关注。可以说,它也为强化学习(RL)现象和领域奠定了基础。几十年来,人们引入了新的概念和基本原理来提高LA的速度和准确性。这些包括使用概率更新函数、离散化概率空间以及使用“追踪”概念。最近,当动作数量很大时,将“结构”纳入LA动作排序的概念提高了相应分层机器的速度和准确性。这导致了ε -最优分层连续追踪LA(HCPA)。本文率先将上述所有现象纳入一个新的单一LA,从而产生了新颖的分层离散追踪LA(HDPA)。事实上,尽管先前提出的HCPA很强大,但当任何动作概率接近1时,其速度会受到阻碍,因为当任何动作概率变得更接近1时,概率向量各分量的更新相应变小。我们在此提出新颖的HDPA,我们将离散化现象融入动作概率向量的更新功能中,并在机器分层结构的每个阶段递归调用。这种离散化功能不存在同样的阻碍,因为离散化阻止了它。我们通过利用适度性属性正式证明ε -最优性,展示了HDPA的鲁棒性和有效性。我们还在每个层次调用下鞅特性,以证明最优动作的动作概率随着时间趋于无穷大收敛到1。除了新机器是ε -最优的之外,数值结果表明,与最先进的HCPA方案相比,HDPA收敛所需的迭代次数显著减少。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验