基于任务序列模型和深度强化学习的多卫星观测重规划方法

Mission Sequence Model and Deep Reinforcement Learning-Based Replanning Method for Multi-Satellite Observation.

作者信息

Li Peiyan, Cui Peixing, Wang Huiquan

机构信息

School of Aeronautics and Astronautics, Zhejiang University, Hangzhou 310027, China.

出版信息

Sensors (Basel). 2025 Mar 10;25(6):1707. doi: 10.3390/s25061707.

DOI:10.3390/s25061707

PMID:40292786

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11946041/

Abstract

With the rapid increase in the number of Earth Observation Satellites (EOSs), research on autonomous mission scheduling has become increasingly critical for optimizing satellite sensor operations. While most existing studies focus on static environments or initial planning states, few address the challenge of dynamic request replanning for real-time sensor management. In this paper, we tackle the problem of multi-satellite rapid mission replanning under dynamic batch-arrival observation requests. The objective is to maximize overall observation revenue while minimizing disruptions to the original scheme. We propose a framework that integrates stochastic master-satellite mission allocation with single-satellite replanning, supported by reactive scheduling policies trained via deep reinforcement learning. Our approach leverages mission sequence modeling with attention mechanisms and time-attitude-aware rotary positional encoding to guide replanning. Additionally, scalable embeddings are employed to handle varying volumes of dynamic requests. The mission allocation phase efficiently generates assignment solutions using a pointer network, while the replanning phase introduces a hybrid action space for direct task insertion. Both phases are formulated as Markov Decision Processes (MDPs) and optimized using the PPO algorithm. Extensive simulations demonstrate that our method significantly outperforms state-of-the-art approaches, achieving a 15.27% higher request insertion revenue rate and a 3.05% improvement in overall mission revenue rate, while maintaining a 1.17% lower modification rate and achieving faster computational speeds. This demonstrates the effectiveness of our approach in real-world satellite sensor applications.

摘要

随着对地观测卫星（EOS）数量的迅速增加，自主任务调度研究对于优化卫星传感器操作变得越来越关键。虽然大多数现有研究集中在静态环境或初始规划状态，但很少有研究解决实时传感器管理中动态请求重新规划的挑战。在本文中，我们解决了在动态批量到达观测请求下的多卫星快速任务重新规划问题。目标是在尽量减少对原始方案干扰的同时最大化总体观测收益。我们提出了一个框架，该框架将随机主卫星任务分配与单卫星重新规划相结合，并由通过深度强化学习训练的反应式调度策略提供支持。我们的方法利用带有注意力机制的任务序列建模和时间 - 姿态感知旋转位置编码来指导重新规划。此外，采用可扩展嵌入来处理不同数量的动态请求。任务分配阶段使用指针网络有效地生成分配解决方案，而重新规划阶段引入混合动作空间用于直接任务插入。两个阶段都被表述为马尔可夫决策过程（MDP）并使用近端策略优化算法（PPO）进行优化。广泛的模拟表明，我们的方法显著优于现有方法，请求插入收益率提高了15.27%，总体任务收益率提高了3.05%，同时修改率降低了1.17%且计算速度更快。这证明了我们的方法在实际卫星传感器应用中的有效性。