Jiang Wei-Cheng, Narayanan Vignesh, Li Jr-Shin
IEEE Trans Cybern. 2021 Dec;51(12):5717-5727. doi: 10.1109/TCYB.2019.2958912. Epub 2021 Dec 22.
An imposing task for a reinforcement learning agent in an uncertain environment is to expeditiously learn a policy or a sequence of actions, with which it can achieve the desired goal. In this article, we present an incremental model learning scheme to reconstruct the model of a stochastic environment. In the proposed learning scheme, we introduce a clustering algorithm to assimilate the model information and estimate the probability for each state transition. In addition, utilizing the reconstructed model, we present an experience replay strategy to create virtual interactive experiences by incorporating a balance between exploration and exploitation, which greatly accelerates learning and enables planning. Furthermore, we extend the proposed learning scheme for a multiagent framework to decrease the effort required for exploration and to reduce the learning time in a large environment. In this multiagent framework, we introduce a knowledge-sharing algorithm to share the reconstructed model information among the different agents, as needed, and develop a computationally efficient knowledge fusing mechanism to fuse the knowledge acquired using the agents' own experience with the knowledge received from its teammates. Finally, the simulation results with comparative analysis are provided to demonstrate the efficacy of the proposed methods in the complex learning tasks.
对于强化学习智能体而言,在不确定环境中一项艰巨的任务是迅速学习一种策略或一系列行动,借此实现预期目标。在本文中,我们提出一种增量模型学习方案来重建随机环境的模型。在所提出的学习方案中,我们引入一种聚类算法来吸收模型信息并估计每次状态转移的概率。此外,利用重建的模型,我们提出一种经验回放策略,通过在探索与利用之间取得平衡来创建虚拟交互体验,这极大地加速了学习并实现了规划。再者,我们将所提出的学习方案扩展到多智能体框架,以减少探索所需的努力并缩短在大型环境中的学习时间。在这个多智能体框架中,我们引入一种知识共享算法,以便在需要时在不同智能体之间共享重建的模型信息,并开发一种计算高效的知识融合机制,将利用智能体自身经验获取的知识与从队友那里接收到的知识进行融合。最后,提供了带有对比分析的仿真结果,以证明所提方法在复杂学习任务中的有效性。