Department of Electrical and Computer Engineering, Michigan Technological University, Houghton, MI 49931, USA.
Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 07030, USA.
Sensors (Basel). 2018 Nov 9;18(11):3859. doi: 10.3390/s18113859.
This work studies online learning-based trajectory planning for multiple autonomous underwater vehicles (AUVs) to estimate a water parameter field of interest in the under-ice environment. A centralized system is considered, where several fixed access points on the ice layer are introduced as gateways for communications between the AUVs and a remote data fusion center. We model the water parameter field of interest as a Gaussian process with unknown hyper-parameters. The AUV trajectories for sampling are determined on an epoch-by-epoch basis. At the end of each epoch, the access points relay the observed field samples from all the AUVs to the fusion center, which computes the posterior distribution of the field based on the Gaussian process regression and estimates the field hyper-parameters. The optimal trajectories of all the AUVs in the next epoch are determined to maximize a long-term reward that is defined based on the field uncertainty reduction and the AUV mobility cost, subject to the kinematics constraint, the communication constraint and the sensing area constraint. We formulate the adaptive trajectory planning problem as a Markov decision process (MDP). A reinforcement learning-based online learning algorithm is designed to determine the optimal AUV trajectories in a constrained continuous space. Simulation results show that the proposed learning-based trajectory planning algorithm has performance similar to a benchmark method that assumes perfect knowledge of the field hyper-parameters.
本工作研究了基于在线学习的多自主水下机器人(AUV)轨迹规划,以估计冰层下环境中感兴趣的水参数场。考虑了集中式系统,其中在冰层上引入了几个固定接入点作为 AUV 和远程数据融合中心之间通信的网关。我们将感兴趣的水参数场建模为具有未知超参数的高斯过程。采样的 AUV 轨迹是在逐个epoch 的基础上确定的。在每个epoch 的末尾,接入点将所有 AUV 观测到的场样本中继到融合中心,该中心基于高斯过程回归计算场的后验分布,并估计场超参数。在下一个epoch 中所有 AUV 的最优轨迹是通过最大化基于场不确定性减少和 AUV 移动性成本的长期奖励来确定的,同时受到运动学约束、通信约束和传感区域约束的限制。我们将自适应轨迹规划问题表述为马尔可夫决策过程(MDP)。设计了一种基于强化学习的在线学习算法,以在受限的连续空间中确定最优的 AUV 轨迹。仿真结果表明,所提出的基于学习的轨迹规划算法的性能与假设场超参数完全已知的基准方法相当。