Suppr超能文献

一种使用Spark对大量地形数据进行空间邻域分析的并行计算方法。

A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark.

作者信息

Zhang Jianbo, Ye Zhuangzhuang, Zheng Kai

机构信息

School of Geography Information Engineering, China University of Geosciecnes, Wuhan 430074, China.

出版信息

Sensors (Basel). 2021 Jan 7;21(2):365. doi: 10.3390/s21020365.

Abstract

Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased.

摘要

空间邻域分析是地理栅格空间分析不可或缺的一部分。在大数据时代,高分辨率栅格数据为我们提供了丰富而有价值的信息,但也给现有的局部统计算法带来了巨大的计算挑战。仅仅使用内存计算框架Spark来处理此类应用可能会出现性能问题,因为它缺乏对空间数据的原生支持。在本文中,我们提出了一种基于Spark的用于邻域分析局部算法的并行计算方法。该方法通过三个步骤实现对大量地形数据的高效处理:(1)采用适用于Hadoop分布式文件系统(HDFS)的基于瓦片的多文件存储策略,将栅格数字高程模型(DEM)文件划分为多个方形瓦片文件;(2)使用动态计算窗口(DCW)计算策略对这些瓦片文件执行典型的坡度算法;(3)将计算结果写回并合并为一个完整的栅格文件。对澳大利亚数字高程数据的实验表明,所提出的计算方法可以有效提高局部统计算法的并行性能。结果还表明,该方法的计算精度与ArcGIS几乎相同。当集群中Spark执行器的数量增加时,所提出的方法也表现出良好的可扩展性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/b1f847a7129c/sensors-21-00365-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验