Wu Jingda, Huang Zhiyu, Huang Wenhui, Lv Chen
IEEE Trans Neural Netw Learn Syst. 2022 Jun 10;PP. doi: 10.1109/TNNLS.2022.3177685.
Reinforcement learning (RL) requires skillful definition and remarkable computational efforts to solve optimization and control problems, which could impair its prospect. Introducing human guidance into RL is a promising way to improve learning performance. In this article, a comprehensive human guidance-based RL framework is established. A novel prioritized experience replay mechanism that adapts to human guidance in the RL process is proposed to boost the efficiency and performance of the RL algorithm. To relieve the heavy workload on human participants, a behavior model is established based on an incremental online learning method to mimic human actions. We design two challenging autonomous driving tasks for evaluating the proposed algorithm. Experiments are conducted to access the training and testing performance and learning mechanism of the proposed algorithm. Comparative results against the state-of-the-art methods suggest the advantages of our algorithm in terms of learning efficiency, performance, and robustness.
强化学习(RL)需要巧妙的定义和巨大的计算量来解决优化和控制问题,这可能会损害其前景。将人类指导引入强化学习是提高学习性能的一种有前途的方法。在本文中,建立了一个全面的基于人类指导的强化学习框架。提出了一种新颖的优先经验回放机制,该机制在强化学习过程中适应人类指导,以提高强化学习算法的效率和性能。为了减轻人类参与者的繁重工作量,基于增量在线学习方法建立了一个行为模型来模仿人类行为。我们设计了两个具有挑战性的自动驾驶任务来评估所提出的算法。进行实验以评估所提出算法的训练和测试性能以及学习机制。与最先进方法的比较结果表明了我们算法在学习效率、性能和鲁棒性方面的优势。