Zhang Jianbo, Ye Zhuangzhuang, Zheng Kai
School of Geography Information Engineering, China University of Geosciecnes, Wuhan 430074, China.
Sensors (Basel). 2021 Jan 7;21(2):365. doi: 10.3390/s21020365.
Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased.
空间邻域分析是地理栅格空间分析不可或缺的一部分。在大数据时代,高分辨率栅格数据为我们提供了丰富而有价值的信息,但也给现有的局部统计算法带来了巨大的计算挑战。仅仅使用内存计算框架Spark来处理此类应用可能会出现性能问题,因为它缺乏对空间数据的原生支持。在本文中,我们提出了一种基于Spark的用于邻域分析局部算法的并行计算方法。该方法通过三个步骤实现对大量地形数据的高效处理:(1)采用适用于Hadoop分布式文件系统(HDFS)的基于瓦片的多文件存储策略,将栅格数字高程模型(DEM)文件划分为多个方形瓦片文件;(2)使用动态计算窗口(DCW)计算策略对这些瓦片文件执行典型的坡度算法;(3)将计算结果写回并合并为一个完整的栅格文件。对澳大利亚数字高程数据的实验表明,所提出的计算方法可以有效提高局部统计算法的并行性能。结果还表明,该方法的计算精度与ArcGIS几乎相同。当集群中Spark执行器的数量增加时,所提出的方法也表现出良好的可扩展性。