基于局部注意力的深度强化学习求解单颗敏捷光学卫星调度问题

Deep Reinforcement Learning with Local Attention for Single Agile Optical Satellite Scheduling Problem.

作者信息

Liu Zheng, Xiong Wei, Han Chi, Yu Xiaolan

机构信息

National Key Laboratory of Space Target Awareness, Space Engineering University, Beijing 101416, China.

出版信息

Sensors (Basel). 2024 Oct 2;24(19):6396. doi: 10.3390/s24196396.

DOI:10.3390/s24196396

PMID:39409435

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11479382/

Abstract

This paper investigates the single agile optical satellite scheduling problem, which has received increasing attention due to the rapid growth in earth observation requirements. Owing to the complicated constraints and considerable solution space of this problem, the conventional exact methods and heuristic methods, which are sensitive to the problem scale, demand high computational expenses. Thus, an efficient approach is demanded to solve this problem, and this paper proposes a deep reinforcement learning algorithm with a local attention mechanism. A mathematical model is first established to describe this problem, which considers a series of complex constraints and takes the profit ratio of completed tasks as the optimization objective. Then, a neural network framework with an encoder-decoder structure is adopted to generate high-quality solutions, and a local attention mechanism is designed to improve the generation of solutions. In addition, an adaptive learning rate strategy is proposed to guide the actor-critic training algorithm to dynamically adjust the learning rate in the training process to enhance the training effectiveness of the proposed network. Finally, extensive experiments verify that the proposed algorithm outperforms the comparison algorithms in terms of solution quality, generalization performance, and computation efficiency.

摘要

本文研究了单颗敏捷光学卫星调度问题，由于对地观测需求的快速增长，该问题受到了越来越多的关注。由于该问题具有复杂的约束条件和相当大的解空间，传统的精确方法和启发式方法对问题规模敏感，需要高昂的计算成本。因此，需要一种有效的方法来解决这个问题，本文提出了一种具有局部注意力机制的深度强化学习算法。首先建立一个数学模型来描述这个问题，该模型考虑了一系列复杂的约束条件，并以完成任务的利润率作为优化目标。然后，采用具有编码器-解码器结构的神经网络框架来生成高质量的解，并设计了一种局部注意力机制来改进解的生成。此外，还提出了一种自适应学习率策略，以指导actor-critic训练算法在训练过程中动态调整学习率，提高所提网络的训练效果。最后，大量实验验证了所提算法在解质量、泛化性能和计算效率方面优于比较算法。