Nguyen Quang Dang, Prokopenko Mikhail
Centre for Complex Systems, Faculty of Engineering, University of Sydney, Sydney, NSW, Australia.
Front Robot AI. 2020 Sep 16;7:123. doi: 10.3389/frobt.2020.00123. eCollection 2020.
We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter learning. The proposed solution is shown to be feasible for replacing a selected behavioral module in a well-established RoboCup base team, , in which behavioral modules have been evolved with human experts in the loop. Furthermore, we introduce an additional performance-correlated signal (a delayed reward signal), enabling a search for local maxima during a training phase. The extension is compared against a known benchmark. Finally, we investigate the extent to which preserving the structure of expert-designed behaviors affects the performance of a neural network-based solution.
我们描述并评估了一种基于神经网络的架构,旨在模仿并提升在RoboCup足球2D模拟环境中全自主足球队的表现。该方法利用深度Q网络架构进行动作判定,并使用深度神经网络进行参数学习。结果表明,所提出的解决方案对于替换一支成熟的RoboCup基础球队中选定的行为模块是可行的,在该基础球队中,行为模块是在人类专家参与的情况下逐步演化而来的。此外,我们引入了一个额外的与性能相关的信号(延迟奖励信号),以便在训练阶段搜索局部最大值。将该扩展与一个已知基准进行了比较。最后,我们研究了保留专家设计行为的结构在多大程度上会影响基于神经网络的解决方案的性能。