Dirgová Luptáková Iveta, Kubovčík Martin, Pospíchal Jiří
Institute of Computer Technologies and Informatics, Faculty of Natural Sciences, University of Ss. Cyril and Methodius, J. Herdu 2, 917 01 Trnava, Slovakia.
Sensors (Basel). 2024 Mar 16;24(6):1905. doi: 10.3390/s24061905.
A transformer neural network is employed in the present study to predict Q-values in a simulated environment using reinforcement learning techniques. The goal is to teach an agent to navigate and excel in the Flappy Bird game, which became a popular model for control in machine learning approaches. Unlike most top existing approaches that use the game's rendered image as input, our main contribution lies in using sensory input from LIDAR, which is represented by the ray casting method. Specifically, we focus on understanding the temporal context of measurements from a ray casting perspective and optimizing potentially risky behavior by considering the degree of the approach to objects identified as obstacles. The agent learned to use the measurements from ray casting to avoid collisions with obstacles. Our model substantially outperforms related approaches. Going forward, we aim to apply this approach in real-world scenarios.
在本研究中,我们采用了一种变压器神经网络,利用强化学习技术在模拟环境中预测Q值。目标是训练一个智能体在《Flappy Bird》游戏中导航并表现出色,该游戏已成为机器学习方法中控制方面的一个流行模型。与大多数现有的顶级方法不同,这些方法使用游戏的渲染图像作为输入,我们的主要贡献在于使用激光雷达的传感输入,这是通过光线投射方法来表示的。具体而言,我们专注于从光线投射的角度理解测量的时间上下文,并通过考虑接近被识别为障碍物的物体的程度来优化潜在的危险行为。智能体学会了利用光线投射的测量结果来避免与障碍物碰撞。我们的模型显著优于相关方法。展望未来,我们旨在将这种方法应用于实际场景。