具有时间限制预算的无人水面舰艇轨迹规划的风险敏感马尔可夫决策过程

Risk-Sensitive Markov Decision Processes of USV Trajectory Planning with Time-Limited Budget.

作者信息

Ding Yi, Zhu Hongyang

机构信息

Maritime College, Guangdong Ocean University, Zhanjiang 524091, China.

College of Mathematics and Computer, Guangdong Ocean University, Zhanjiang 524091, China.

出版信息

Sensors (Basel). 2023 Sep 13;23(18):7846. doi: 10.3390/s23187846.

DOI:10.3390/s23187846

PMID:37765903

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10535518/

Abstract

Trajectory planning plays a crucial role in ensuring the safe navigation of ships, as it involves complex decision making influenced by various factors. This paper presents a heuristic algorithm, named the Markov decision process Heuristic Algorithm (MHA), for time-optimized avoidance of Unmanned Surface Vehicles (USVs) based on a Risk-Sensitive Markov decision process model. The proposed method utilizes the Risk-Sensitive Markov decision process model to generate a set of states within the USV collision avoidance search space. These states are determined based on the reachable locations and directions considering the time cost associated with the set of actions. By incorporating an enhanced reward function and a constraint time-dependent cost function, the USV can effectively plan practical motion paths that align with its actual time constraints. Experimental results demonstrate that the MHA algorithm enables decision makers to evaluate the trade-off between the budget and the probability of achieving the goal within the given budget. Moreover, the local stochastic optimization criterion assists the agent in selecting collision avoidance paths without significantly increasing the risk of collision.

摘要

轨迹规划在确保船舶安全航行中起着至关重要的作用，因为它涉及受各种因素影响的复杂决策。本文提出了一种启发式算法，称为马尔可夫决策过程启发式算法（MHA），用于基于风险敏感马尔可夫决策过程模型对无人水面舰艇（USV）进行时间优化避碰。该方法利用风险敏感马尔可夫决策过程模型在USV避碰搜索空间内生成一组状态。这些状态是根据考虑与该组动作相关的时间成本的可达位置和方向来确定的。通过纳入增强奖励函数和约束时间相关成本函数，USV可以有效地规划符合其实际时间约束的实际运动路径。实验结果表明，MHA算法使决策者能够评估预算与在给定预算内实现目标的概率之间的权衡。此外，局部随机优化准则有助于智能体选择避碰路径，而不会显著增加碰撞风险。