Li Wei, Liu Yi, Ma Yan, Xu Kang, Qiu Jiang, Gan Zhongxue
Academy for Engineering and Technology, Fudan University, Shanghai, China.
Ji Hua Laboratory, Department of Engineering Research Center for Intelligent Robotics, Foshan, China.
Front Neurorobot. 2023 Jul 6;17:1039644. doi: 10.3389/fnbot.2023.1039644. eCollection 2023.
This paper proposes a self-learning Monte Carlo tree search algorithm (SL-MCTS), which has the ability to continuously improve its problem-solving ability in single-player scenarios. SL-MCTS combines the MCTS algorithm with a two-branch neural network (PV-Network). The MCTS architecture can balance the search for exploration and exploitation. PV-Network replaces the rollout process of MCTS and predicts the promising search direction and the value of nodes, which increases the MCTS convergence speed and search efficiency. The paper proposes an effective method to assess the trajectory of the current model during the self-learning process by comparing the performance of the current model with that of its best-performing historical model. Additionally, this method can encourage SL-MCTS to generate optimal solutions during the self-learning process. We evaluate the performance of SL-MCTS on the robot path planning scenario. The experimental results show that the performance of SL-MCTS is far superior to the traditional MCTS and single-player MCTS algorithms in terms of path quality and time consumption, especially its time consumption is half less than that of the traditional MCTS algorithms. SL-MCTS also performs comparably to other iterative-based search algorithms designed specifically for path planning tasks.
本文提出了一种自学习蒙特卡洛树搜索算法(SL-MCTS),该算法能够在单人场景中不断提高其解决问题的能力。SL-MCTS将蒙特卡洛树搜索算法与一个双分支神经网络(PV-Network)相结合。蒙特卡洛树搜索架构能够平衡探索与利用的搜索。PV-Network取代了蒙特卡洛树搜索的模拟过程,并预测有前景的搜索方向和节点的值,这提高了蒙特卡洛树搜索的收敛速度和搜索效率。本文提出了一种有效的方法,通过将当前模型的性能与其表现最佳的历史模型的性能进行比较,来评估自学习过程中当前模型的轨迹。此外,该方法能够促使SL-MCTS在自学习过程中生成最优解。我们在机器人路径规划场景中评估了SL-MCTS的性能。实验结果表明,SL-MCTS在路径质量和时间消耗方面的性能远远优于传统蒙特卡洛树搜索算法和单人蒙特卡洛树搜索算法,尤其是其时间消耗比传统蒙特卡洛树搜索算法少一半。SL-MCTS的性能也与专门为路径规划任务设计的其他基于迭代的搜索算法相当。