• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用Spark对大量地形数据进行空间邻域分析的并行计算方法。

A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark.

作者信息

Zhang Jianbo, Ye Zhuangzhuang, Zheng Kai

机构信息

School of Geography Information Engineering, China University of Geosciecnes, Wuhan 430074, China.

出版信息

Sensors (Basel). 2021 Jan 7;21(2):365. doi: 10.3390/s21020365.

DOI:10.3390/s21020365
PMID:33430375
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7827788/
Abstract

Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased.

摘要

空间邻域分析是地理栅格空间分析不可或缺的一部分。在大数据时代,高分辨率栅格数据为我们提供了丰富而有价值的信息,但也给现有的局部统计算法带来了巨大的计算挑战。仅仅使用内存计算框架Spark来处理此类应用可能会出现性能问题,因为它缺乏对空间数据的原生支持。在本文中,我们提出了一种基于Spark的用于邻域分析局部算法的并行计算方法。该方法通过三个步骤实现对大量地形数据的高效处理:(1)采用适用于Hadoop分布式文件系统(HDFS)的基于瓦片的多文件存储策略,将栅格数字高程模型(DEM)文件划分为多个方形瓦片文件;(2)使用动态计算窗口(DCW)计算策略对这些瓦片文件执行典型的坡度算法;(3)将计算结果写回并合并为一个完整的栅格文件。对澳大利亚数字高程数据的实验表明,所提出的计算方法可以有效提高局部统计算法的并行性能。结果还表明,该方法的计算精度与ArcGIS几乎相同。当集群中Spark执行器的数量增加时,所提出的方法也表现出良好的可扩展性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/8e73bae05f91/sensors-21-00365-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/b1f847a7129c/sensors-21-00365-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/72c5d40a27c7/sensors-21-00365-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/93dfd8005ae5/sensors-21-00365-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/afc513c2f213/sensors-21-00365-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/cdb5833fdd1b/sensors-21-00365-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/92597776d3f4/sensors-21-00365-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/7fd41494e6f6/sensors-21-00365-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/55cb83965475/sensors-21-00365-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/53b364d71224/sensors-21-00365-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/c9830edf56de/sensors-21-00365-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/b25298afed3c/sensors-21-00365-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/11d628219b01/sensors-21-00365-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/e03c7c587634/sensors-21-00365-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/bd41b2d3f676/sensors-21-00365-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/2fff90c639e2/sensors-21-00365-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/1161a99e5b90/sensors-21-00365-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/b850b7062a43/sensors-21-00365-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/8e73bae05f91/sensors-21-00365-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/b1f847a7129c/sensors-21-00365-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/72c5d40a27c7/sensors-21-00365-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/93dfd8005ae5/sensors-21-00365-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/afc513c2f213/sensors-21-00365-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/cdb5833fdd1b/sensors-21-00365-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/92597776d3f4/sensors-21-00365-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/7fd41494e6f6/sensors-21-00365-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/55cb83965475/sensors-21-00365-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/53b364d71224/sensors-21-00365-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/c9830edf56de/sensors-21-00365-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/b25298afed3c/sensors-21-00365-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/11d628219b01/sensors-21-00365-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/e03c7c587634/sensors-21-00365-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/bd41b2d3f676/sensors-21-00365-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/2fff90c639e2/sensors-21-00365-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/1161a99e5b90/sensors-21-00365-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/b850b7062a43/sensors-21-00365-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1799/7827788/8e73bae05f91/sensors-21-00365-g018.jpg

相似文献

1
A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark.一种使用Spark对大量地形数据进行空间邻域分析的并行计算方法。
Sensors (Basel). 2021 Jan 7;21(2):365. doi: 10.3390/s21020365.
2
A distributed computing model for big data anonymization in the networks.一种用于网络大数据匿名化的分布式计算模型。
PLoS One. 2023 Apr 28;18(4):e0285212. doi: 10.1371/journal.pone.0285212. eCollection 2023.
3
On-the-Fly Fusion of Remotely-Sensed Big Data Using an Elastic Computing Paradigm with a Containerized Spark Engine on Kubernetes.在Kubernetes上使用带有容器化Spark引擎的弹性计算范式对遥感大数据进行实时融合。
Sensors (Basel). 2021 Apr 23;21(9):2971. doi: 10.3390/s21092971.
4
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.MapReduce 编程框架在临床大数据分析中的应用:现状与未来趋势。
BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.
5
A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark.一种基于Apache Spark的并行多目标粒子群加权平均聚类算法。
Entropy (Basel). 2023 Jan 31;25(2):259. doi: 10.3390/e25020259.
6
ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark.ADS-HCSpark:一种可扩展的基于 Spark 的单倍型调用程序,利用自适应数据分段来加速变异调用。
BMC Bioinformatics. 2019 Feb 14;20(1):76. doi: 10.1186/s12859-019-2665-0.
7
MISS-D: A fast and scalable framework of medical image storage service based on distributed file system.MISS-D:一种基于分布式文件系统的快速可扩展的医学图像存储服务框架。
Comput Methods Programs Biomed. 2020 Apr;186:105189. doi: 10.1016/j.cmpb.2019.105189. Epub 2019 Nov 14.
8
Big Data in metagenomics: Apache Spark vs MPI.宏基因组学中的大数据:Apache Spark 与 MPI。
PLoS One. 2020 Oct 6;15(10):e0239741. doi: 10.1371/journal.pone.0239741. eCollection 2020.
9
Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files.优化的分布式系统在大规模 VCF 文件的排序合并方面实现了显著的性能提升。
Gigascience. 2018 Jun 1;7(6). doi: 10.1093/gigascience/giy052.
10
Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics.分析基因组序列的大数据集:快速可扩展的 k-mer 统计信息收集。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):138. doi: 10.1186/s12859-019-2694-8.

引用本文的文献

1
Development of a Low-Cost Distributed Computing Pipeline for High-Throughput Cotton Phenotyping.开发低成本分布式计算管道,实现高通量棉花表型分析。
Sensors (Basel). 2024 Feb 2;24(3):970. doi: 10.3390/s24030970.
2
An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop.一种针对Hadoop上大规模地理空间3D栅格数据的高效基于组的副本放置策略。
Sensors (Basel). 2021 Dec 5;21(23):8132. doi: 10.3390/s21238132.

本文引用的文献

1
Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range.根据样本量、中位数、极差和/或四分位数间距估算样本均值和标准差。
BMC Med Res Methodol. 2014 Dec 19;14:135. doi: 10.1186/1471-2288-14-135.
2
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.Hadoop-GIS:一种基于MapReduce的高性能空间数据仓库系统。
Proceedings VLDB Endowment. 2013 Aug;6(11).
3
Reorienting with terrain slope and landmarks.根据地形坡度和地标重新定向。
Mem Cognit. 2013 Feb;41(2):214-28. doi: 10.3758/s13421-012-0254-9.