Zhu Jiangcheng, Zhu Jun, Wang Zhepei, Guo Shan, Xu Chao
IEEE Trans Neural Netw Learn Syst. 2019 Feb;30(2):464-473. doi: 10.1109/TNNLS.2018.2844466. Epub 2018 Jul 2.
This paper proposes a hierarchical decision-making and control algorithm for the shepherd game, the seventh mission in the International Aerial Robotics Competition (IARC). In this game, the agent (a multirotor aerial robot) is required to contact targets (ground vehicles) sequentially and drive them to a certain boundary to earn score. During the game of 10 min, the agent should be fully autonomous without any human interference. Regarding the lower-level controller and dynamics of the agent, each action takes a duration of time to accomplish. Denoted as an action delay, in this paper, this action duration is nonconstant and is related to the final reward. Therefore, the challenging point is making the agent "aware of time" when applying a certain action. We solve this problem by two approaches: deep Q-networks and lookup table. The action delay predictor in the decision-level is fitted by a lower-level controller. Through simulations by the example of the shepherd game, the effectiveness and efficiency of this approach are validated. This paper helps our team winning the first prize in IARC 2017, and keeps the best record of this mission since it was released in 2013.
本文针对国际空中机器人竞赛(IARC)的第七项任务——牧羊游戏,提出了一种分层决策与控制算法。在该游戏中,智能体(多旋翼空中机器人)需要依次接触目标(地面车辆)并将它们驱赶到特定边界以获取分数。在10分钟的游戏过程中,智能体应完全自主,不受任何人为干扰。考虑到智能体的低级控制器和动力学,每个动作都需要一定的时间来完成。在本文中,这个动作持续时间被称为动作延迟,它是不固定的,并且与最终奖励有关。因此,具有挑战性的一点是让智能体在执行某个动作时“意识到时间”。我们通过两种方法解决这个问题:深度Q网络和查找表。决策层中的动作延迟预测器由低级控制器拟合。通过以牧羊游戏为例进行仿真,验证了该方法的有效性和效率。本文帮助我们团队在2017年IARC中获得一等奖,并保持了该任务自2013年发布以来的最佳记录。