Consoli Davide, Parente Leandro, Simoes Rolf, Şahin Murat, Tian Xuemeng, Witjes Martijn, Sloat Lindsey, Hengl Tomislav
OpenGeoHub Foundation, Doorwerth, Netherlands.
Laboratory of Geo-Information Science and Remote Sensing, Wageningen University and Research, Wageningen, Netherlands.
PeerJ. 2024 Dec 4;12:e18585. doi: 10.7717/peerj.18585. eCollection 2024.
Processing large collections of earth observation (EO) time-series, often petabyte-sized, such as NASA's Landsat and ESA's Sentinel missions, can be computationally prohibitive and costly. Despite their name, even the Analysis Ready Data (ARD) versions of such collections can rarely be used as direct input for modeling because of cloud presence and/or prohibitive storage size. Existing solutions for readily using these data are not openly available, are poor in performance, or lack flexibility. Addressing this issue, we developed TSIRF (Time-Series Iteration-free Reconstruction Framework), a computational framework that can be used to apply diverse time-series processing tasks, such as temporal aggregation and time-series reconstruction by simply adjusting the convolution kernel. As the first large-scale application, TSIRF was employed to process the entire Global Land Analysis and Discovery (GLAD) ARD Landsat archive, producing a cloud-free bi-monthly aggregated product. This process, covering seven Landsat bands globally from 1997 to 2022, with more than two trillion pixels and for each one a time-series of 156 samples in the aggregated product, required approximately 28 hours of computation using 1248 Intel Xeon Gold 6248R CPUs. The quality of the result was assessed using a benchmark dataset derived from the aggregated product and comparing different imputation strategies. The resulting reconstructed images can be used as input for machine learning models or to map biophysical indices. To further limit the storage size the produced data was saved as 8-bit Cloud-Optimized GeoTIFFs (COG). With the hosting of about 20 TB per band/index for an entire 30 m resolution bi-monthly historical time-series distributed as open data, the product enables seamless, fast, and affordable access to the Landsat archive for environmental monitoring and analysis applications.
处理大量地球观测(EO)时间序列数据,通常达到拍字节规模,比如美国国家航空航天局(NASA)的陆地卫星任务和欧洲航天局(ESA)的哨兵任务,在计算上可能代价高昂且成本巨大。尽管它们被称为分析就绪数据(ARD)版本,但由于存在云覆盖和/或存储规模过大,这些数据集的版本很少能直接用作建模的输入。现有的能方便使用这些数据的解决方案并非公开可用,性能不佳,或者缺乏灵活性。为解决这个问题,我们开发了TSIRF(无时间序列迭代重建框架),这是一个计算框架,可用于通过简单调整卷积核来应用各种时间序列处理任务,比如时间聚合和时间序列重建。作为首个大规模应用,TSIRF被用于处理整个全球陆地分析与发现(GLAD)ARD陆地卫星存档,生成了一个无云的双月聚合产品。这个过程覆盖了1997年至2022年全球七个陆地卫星波段,超过两万亿像素,并且在聚合产品中每个像素有156个样本的时间序列,使用1248个英特尔至强金牌6248R中央处理器大约需要28小时的计算时间。使用从聚合产品派生的基准数据集并比较不同的插补策略来评估结果的质量。生成的重建图像可作为机器学习模型的输入或用于绘制生物物理指数。为进一步限制存储规模,生成的数据被保存为8位云优化地理TIFF(COG)格式。整个30米分辨率的双月历史时间序列每个波段/指数托管约20TB并作为开放数据分发,该产品使人们能够无缝、快速且经济地访问陆地卫星存档,用于环境监测和分析应用。