Mahmoud Sara, Billing Erik, Svensson Henrik, Thill Serge
Interaction Lab, School of Informatics, University of Skövde, Skövde, Sweden.
Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, Netherlands.
Front Artif Intell. 2023 Jan 25;6:1098982. doi: 10.3389/frai.2023.1098982. eCollection 2023.
Learning from only real-world collected data can be unrealistic and time consuming in many scenario. One alternative is to use synthetic data as learning environments to learn rare situations and replay buffers to speed up the learning. In this work, we examine the hypothesis of how the creation of the environment affects the training of reinforcement learning agent through auto-generated environment mechanisms. We take the autonomous vehicle as an application. We compare the effect of two approaches to generate training data for artificial cognitive agents. We consider the added value of curriculum learning-just as in human learning-as a way to structure novel training data that the agent has not seen before as well as that of using a replay buffer to train further on data the agent has seen before. In other words, the focus of this paper is on characteristics of the training data rather than on learning algorithms. We therefore use two tasks that are commonly trained early on in autonomous vehicle research: lane keeping and pedestrian avoidance. Our main results show that curriculum learning indeed offers an additional benefit over a vanilla reinforcement learning approach (using Deep-Q Learning), but the replay buffer actually has a detrimental effect in most (but not all) combinations of data generation approaches we considered here. The benefit of curriculum learning does depend on the existence of a well-defined difficulty metric with which various training scenarios can be ordered. In the lane-keeping task, we can define it as a function of the curvature of the road, in which the steeper and more occurring curves on the road, the more difficult it gets. Defining such a difficulty metric in other scenarios is not always trivial. In general, the results of this paper emphasize both the importance of considering data characterization, such as curriculum learning, and the importance of defining an appropriate metric for the task.
在许多情况下,仅从真实世界收集的数据进行学习可能不切实际且耗时。一种替代方法是使用合成数据作为学习环境来学习罕见情况,并使用重放缓冲区来加速学习。在这项工作中,我们研究了通过自动生成环境机制创建环境如何影响强化学习智能体训练的假设。我们将自动驾驶车辆作为一个应用场景。我们比较了两种为人工认知智能体生成训练数据的方法的效果。我们考虑了课程学习的附加价值——就像在人类学习中一样——作为一种构建智能体之前未见过的新训练数据的方式,以及使用重放缓冲区在智能体之前见过的数据上进一步训练的附加价值。换句话说,本文的重点是训练数据的特征,而不是学习算法。因此,我们使用了自动驾驶车辆研究中早期通常训练的两项任务:车道保持和行人避让。我们的主要结果表明,课程学习确实比普通强化学习方法(使用深度Q学习)有额外的好处,但重放缓冲区在我们这里考虑的大多数(但不是全部)数据生成方法组合中实际上有不利影响。课程学习的好处确实取决于是否存在一个定义明确的难度度量,通过它可以对各种训练场景进行排序。在车道保持任务中,我们可以将其定义为道路曲率的函数,其中道路上的曲线越陡峭且出现频率越高,难度就越大。在其他场景中定义这样的难度度量并不总是容易的。总的来说,本文的结果强调了考虑数据特征(如课程学习)的重要性以及为任务定义适当度量的重要性。