Huang Wei, Zhou Jianzhong, Zhang Dongying
School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China.
Sensors (Basel). 2021 Apr 23;21(9):2971. doi: 10.3390/s21092971.
Remotely-sensed satellite image fusion is indispensable for the generation of long-term gap-free Earth observation data. While cloud computing (CC) provides the big picture for RS big data (RSBD), the fundamental question of the efficient fusion of RSBD on CC platforms has not yet been settled. To this end, we propose a lightweight cloud-native framework for the elastic processing of RSBD in this study. With the scaling mechanisms provided by both the Infrastructure as a Service (IaaS) and Platform as a Services (PaaS) of CC, the Spark-on-Kubernetes operator model running in the framework can enhance the efficiency of Spark-based algorithms without considering bottlenecks such as task latency caused by an unbalanced workload, and can ease the burden to tune the performance parameters for their parallel algorithms. Internally, we propose a task scheduling mechanism (TSM) to dynamically change the Spark executor pods' affinities to the computing hosts. The TSM learns the workload of a computing host. Learning from the ratio between the number of completed and failed tasks on a computing host, the TSM dispatches Spark executor pods to newer and less-overwhelmed computing hosts. In order to illustrate the advantage, we implement a parallel enhanced spatial and temporal adaptive reflectance fusion model (PESTARFM) to enable the efficient fusion of big RS images with a Spark aggregation function. We construct an OpenStack cloud computing environment to test the usability of the framework. According to the experiments, TSM can improve the performance of the PESTARFM using only PaaS scaling to about 11.7%. When using both the IaaS and PaaS scaling, the maximum performance gain with the TSM can be even greater than 13.6%. The fusion of such big Sentinel and PlanetScope images requires less than 4 min in the experimental environment.
遥感卫星图像融合对于生成长期无间隙的地球观测数据至关重要。虽然云计算(CC)为遥感大数据(RSBD)提供了宏观视角,但在CC平台上高效融合RSBD这一基本问题尚未得到解决。为此,我们在本研究中提出了一个用于RSBD弹性处理的轻量级云原生框架。借助CC的基础设施即服务(IaaS)和平台即服务(PaaS)提供的扩展机制,在该框架中运行的基于Kubernetes的Spark算子模型可以提高基于Spark的算法的效率,而无需考虑诸如工作负载不平衡导致的任务延迟等瓶颈,并且可以减轻为其并行算法调整性能参数的负担。在内部,我们提出了一种任务调度机制(TSM),以动态更改Spark执行器Pod与计算主机的亲和性。TSM了解计算主机的工作负载。通过从计算主机上已完成和失败任务的数量之比中学习,TSM将Spark执行器Pod调度到更新且负载较小的计算主机上。为了说明其优势,我们实现了一个并行增强的时空自适应反射率融合模型(PESTARFM),以通过Spark聚合函数实现大尺寸遥感图像的高效融合。我们构建了一个OpenStack云计算环境来测试该框架的可用性。根据实验,仅使用PaaS扩展时,TSM可以将PESTARFM的性能提高约11.7%。当同时使用IaaS和PaaS扩展时,TSM带来的最大性能提升甚至可以超过13.6%。在实验环境中,融合如此大尺寸的哨兵卫星和行星范围卫星图像所需时间不到4分钟。