Navigation College, Dalian Maritime University, Dalian 116026, China.
Sensors (Basel). 2022 Jul 31;22(15):5732. doi: 10.3390/s22155732.
With the development of artificial intelligence technology, the behavior decision-making of an intelligent smart marine autonomous surface ship (SMASS) has become particularly important. This research proposed local path planning and a behavior decision-making approach based on improved Proximal Policy Optimization (PPO), which could drive an unmanned SMASS to the target without requiring any human experiences. In addition, a generalized advantage estimation was added to the loss function of the PPO algorithm, which allowed baselines in PPO algorithms to be self-adjusted. At first, the SMASS was modeled with the Nomoto model in a simulation waterway. Then, distances, obstacles, and prohibited areas were regularized as rewards or punishments, which were used to judge the performance and manipulation decisions of the vessel Subsequently, improved PPO was introduced to learn the action-reward model, and the neural network model after training was used to manipulate the SMASS's movement. To achieve higher reward values, the SMASS could find an appropriate path or navigation strategy by itself. After a sufficient number of rounds of training, a convincing path and manipulation strategies would likely be produced. Compared with the proposed approach of the existing methods, this approach is more effective in self-learning and continuous optimization and thus closer to human manipulation.
随着人工智能技术的发展,智能智能水面船舶(SMASS)的行为决策变得尤为重要。本研究提出了一种基于改进的近端策略优化(PPO)的局部路径规划和行为决策方法,该方法可在无需任何人类经验的情况下引导无人 SMASS 到达目标。此外,在 PPO 算法的损失函数中添加了广义优势估计,允许 PPO 算法中的基线进行自我调整。首先,在模拟航道中使用 Nomoto 模型对 SMASS 进行建模。然后,将距离、障碍物和禁航区规范化为奖励或惩罚,用于判断船舶的性能和操纵决策。随后,引入改进的 PPO 来学习动作-奖励模型,并使用训练后的神经网络模型来操纵 SMASS 的运动。为了获得更高的奖励值,SMASS 可以自行找到合适的路径或导航策略。经过足够数量的回合训练,可能会产生令人信服的路径和操纵策略。与现有方法的提出的方法相比,该方法在自我学习和持续优化方面更有效,因此更接近人类的操纵。