Xu Can, Zhang Yin, Wang Weigang, Dong Ligang
School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China.
School of Information and Electronic Engineering, Sussex Artificial Intelligence Institute, Zhejiang Gongshang University, Hangzhou, China.
Front Bioeng Biotechnol. 2022 Mar 22;10:827408. doi: 10.3389/fbioe.2022.827408. eCollection 2022.
Since the emergence of deep neural network (DNN), it has achieved excellent performance in various research areas. As the combination of DNN and reinforcement learning, deep reinforcement learning (DRL) becomes a new paradigm for solving differential game problems. In this study, we build up a reinforcement learning environment and apply relevant DRL methods to a specific bio-inspired differential game problem: the dog sheep game. The dog sheep game environment is set on a circle where the dog chases down the sheep attempting to escape. According to some presuppositions, we are able to acquire the kinematic pursuit and evasion strategy. Next, this study implements the value-based deep Q network (DQN) model and the deep deterministic policy gradient (DDPG) model to the dog sheep game, attempting to endow the sheep the ability to escape successfully. To enhance the performance of the DQN model, this study brought up the reward mechanism with a time-out strategy and the game environment with an attenuation mechanism of the steering angle of sheep. These modifications effectively increase the probability of escape for the sheep. Furthermore, the DDPG model is adopted due to its continuous action space. Results show the modifications of the DQN model effectively increase the escape probabilities to the same level as the DDPG model. When it comes to the learning ability under various environment difficulties, the refined DQN and the DDPG models have bigger performance enhancement over the naive evasion model in harsh environments than in loose environments.
自深度神经网络(DNN)出现以来,它在各个研究领域都取得了优异的性能。作为DNN与强化学习的结合,深度强化学习(DRL)成为解决微分博弈问题的一种新范式。在本研究中,我们构建了一个强化学习环境,并将相关的DRL方法应用于一个特定的受生物启发的微分博弈问题:犬羊博弈。犬羊博弈环境设置在一个圆圈上,狗追逐试图逃跑的羊。根据一些预设,我们能够获得运动学上的追逐和逃避策略。接下来,本研究将基于价值的深度Q网络(DQN)模型和深度确定性策略梯度(DDPG)模型应用于犬羊博弈,试图赋予羊成功逃脱的能力。为了提高DQN模型的性能,本研究提出了带有超时策略的奖励机制以及具有羊转向角衰减机制的游戏环境。这些改进有效地提高了羊逃脱的概率。此外,由于DDPG模型具有连续动作空间,因此采用了该模型。结果表明,DQN模型的改进有效地将逃脱概率提高到了与DDPG模型相同的水平。在各种环境难度下的学习能力方面,在恶劣环境中,经过改进的DQN和DDPG模型比在宽松环境中相比朴素逃避模型有更大的性能提升。