基于K折交叉验证的课程强化学习

Curriculum Reinforcement Learning Based on K-Fold Cross Validation.

作者信息

Lin Zeyang, Lai Jun, Chen Xiliang, Cao Lei, Wang Jun

机构信息

Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China.

出版信息

Entropy (Basel). 2022 Dec 6;24(12):1787. doi: 10.3390/e24121787.

DOI:10.3390/e24121787

PMID:36554191

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9778433/

Abstract

With the continuous development of deep reinforcement learning in intelligent control, combining automatic curriculum learning and deep reinforcement learning can improve the training performance and efficiency of algorithms from easy to difficult. Most existing automatic curriculum learning algorithms perform curriculum ranking through expert experience and a single network, which has the problems of difficult curriculum task ranking and slow convergence speed. In this paper, we propose a curriculum reinforcement learning method based on K-Fold Cross Validation that can estimate the relativity score of task curriculum difficulty. Drawing lessons from the human concept of curriculum learning from easy to difficult, this method divides automatic curriculum learning into a curriculum difficulty assessment stage and a curriculum sorting stage. Through parallel training of the teacher model and cross-evaluation of task sample difficulty, the method can better sequence curriculum learning tasks. Finally, simulation comparison experiments were carried out in two types of multi-agent experimental environments. The experimental results show that the automatic curriculum learning method based on K-Fold cross-validation can improve the training speed of the MADDPG algorithm, and at the same time has a certain generality for multi-agent deep reinforcement learning algorithm based on the replay buffer mechanism.

摘要

随着深度强化学习在智能控制领域的不断发展，将自动课程学习与深度强化学习相结合可以从易到难地提高算法的训练性能和效率。现有的大多数自动课程学习算法通过专家经验和单个网络进行课程排序，存在课程任务排序困难和收敛速度慢的问题。在本文中，我们提出了一种基于K折交叉验证的课程强化学习方法，该方法可以估计任务课程难度的相关性得分。借鉴人类从易到难进行课程学习的概念，该方法将自动课程学习分为课程难度评估阶段和课程排序阶段。通过教师模型的并行训练和任务样本难度的交叉评估，该方法可以更好地对课程学习任务进行排序。最后，在两种多智能体实验环境中进行了仿真对比实验。实验结果表明，基于K折交叉验证的自动课程学习方法可以提高MADDPG算法的训练速度，同时对基于回放缓冲区机制的多智能体深度强化学习算法具有一定的通用性。