• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种针对Hadoop上大规模地理空间3D栅格数据的高效基于组的副本放置策略。

An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop.

作者信息

Liu Zhipeng, Hua Weihua, Liu Xiuguo, Liang Dong, Zhao Yabo, Shi Manxing

机构信息

School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China.

出版信息

Sensors (Basel). 2021 Dec 5;21(23):8132. doi: 10.3390/s21238132.

DOI:10.3390/s21238132
PMID:34884135
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8662431/
Abstract

Geospatial three-dimensional (3D) raster data have been widely used for simple representations and analysis, such as geological models, spatio-temporal satellite data, hyperspectral images, and climate data. With the increasing requirements of resolution and accuracy, the amount of geospatial 3D raster data has grown exponentially. In recent years, the processing of large raster data using Hadoop has gained popularity. However, data uploaded to Hadoop are randomly distributed onto datanodes without consideration of the spatial characteristics. As a result, the direct processing of geospatial 3D raster data produces a massive network data exchange among the datanodes and degrades the performance of the cluster. To address this problem, we propose an efficient group-based replica placement policy for large-scale geospatial 3D raster data, aiming to optimize the locations of the replicas in the cluster to reduce the network overhead. An overlapped group scheme was designed for three replicas of each file. The data in each group were placed in the same datanode, and different colocation patterns for three replicas were implemented to further reduce the communication between groups. The experimental results show that our approach significantly reduces the network overhead during data acquisition for 3D raster data in the Hadoop cluster, and maintains the Hadoop replica placement requirements.

摘要

地理空间三维(3D)栅格数据已被广泛用于简单表示和分析,如地质模型、时空卫星数据、高光谱图像和气候数据。随着分辨率和精度要求的不断提高,地理空间3D栅格数据量呈指数级增长。近年来,使用Hadoop处理大型栅格数据变得越来越流行。然而,上传到Hadoop的数据是随机分布在数据节点上的,而没有考虑空间特征。因此,直接处理地理空间3D栅格数据会在数据节点之间产生大量网络数据交换,并降低集群性能。为了解决这个问题,我们针对大规模地理空间3D栅格数据提出了一种高效的基于组的副本放置策略,旨在优化集群中副本的位置以减少网络开销。为每个文件的三个副本设计了一种重叠组方案。每个组中的数据放置在同一个数据节点中,并实现了三个副本的不同共置模式以进一步减少组间通信。实验结果表明,我们的方法显著降低了Hadoop集群中3D栅格数据数据采集期间的网络开销,并满足了Hadoop副本放置要求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/1d5623beb792/sensors-21-08132-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/f3766c6055b5/sensors-21-08132-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/d1098e5bf5d1/sensors-21-08132-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/7611e3683093/sensors-21-08132-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/0c133ae406e6/sensors-21-08132-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/b798ac553ff0/sensors-21-08132-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/efa11e0782a8/sensors-21-08132-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/57be3c21e8d5/sensors-21-08132-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/eb2e387ba1d6/sensors-21-08132-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/bfc3d51fb5e2/sensors-21-08132-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/1d5623beb792/sensors-21-08132-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/f3766c6055b5/sensors-21-08132-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/d1098e5bf5d1/sensors-21-08132-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/7611e3683093/sensors-21-08132-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/0c133ae406e6/sensors-21-08132-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/b798ac553ff0/sensors-21-08132-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/efa11e0782a8/sensors-21-08132-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/57be3c21e8d5/sensors-21-08132-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/eb2e387ba1d6/sensors-21-08132-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/bfc3d51fb5e2/sensors-21-08132-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc5/8662431/1d5623beb792/sensors-21-08132-g010.jpg

相似文献

1
An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop.一种针对Hadoop上大规模地理空间3D栅格数据的高效基于组的副本放置策略。
Sensors (Basel). 2021 Dec 5;21(23):8132. doi: 10.3390/s21238132.
2
A distributed data processing scheme based on Hadoop for synchrotron radiation experiments.一种基于Hadoop的用于同步辐射实验的分布式数据处理方案。
J Synchrotron Radiat. 2024 May 1;31(Pt 3):635-645. doi: 10.1107/S1600577524002637. Epub 2024 Apr 24.
3
HaRD: a heterogeneity-aware replica deletion for HDFS.HaRD:一种用于HDFS的异构感知副本删除方法
J Big Data. 2019;6(1):94. doi: 10.1186/s40537-019-0256-6. Epub 2019 Oct 21.
4
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.Hadoop-GIS:一种基于MapReduce的高性能空间数据仓库系统。
Proceedings VLDB Endowment. 2013 Aug;6(11).
5
A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark.一种使用Spark对大量地形数据进行空间邻域分析的并行计算方法。
Sensors (Basel). 2021 Jan 7;21(2):365. doi: 10.3390/s21020365.
6
Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine.使用Apache Hadoop和Sun Grid Engine进行大数据图像处理的理论与实证比较
Proc SPIE Int Soc Opt Eng. 2017 Feb 11;10138. doi: 10.1117/12.2254712. Epub 2017 Mar 13.
7
EStore: A User-Friendly Encrypted Storage Scheme for Distributed File Systems.EStore:一种适用于分布式文件系统的用户友好型加密存储方案。
Sensors (Basel). 2023 Oct 17;23(20):8526. doi: 10.3390/s23208526.
8
STDADS: An Efficient Slow Task Detection Algorithm for Deadline Schedulers.STDADS:一种用于截止期调度器的高效慢速任务检测算法。
Big Data. 2020 Feb;8(1):62-69. doi: 10.1089/big.2019.0039. Epub 2020 Jan 29.
9
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.Hadoop-GIS演示:一种基于MapReduce的空间数据仓库系统
Proc ACM SIGSPATIAL Int Conf Adv Inf. 2013 Nov;2013:528-531. doi: 10.1145/2525314.2525320.
10
A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.用于分析大规模并行DNA测序数据的Hadoop框架的定量评估。
Gigascience. 2015 Jun 4;4:26. doi: 10.1186/s13742-015-0058-5. eCollection 2015.

本文引用的文献

1
iSPEED: an Efficient In-Memory Based Spatial Query System for Large-Scale 3D Data with Complex Structures.iSPEED:一种用于具有复杂结构的大规模3D数据的高效基于内存的空间查询系统。
Proc ACM SIGSPATIAL Int Conf Adv Inf. 2017 Nov;2017. doi: 10.1145/3139958.3139961.
2
A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark.一种使用Spark对大量地形数据进行空间邻域分析的并行计算方法。
Sensors (Basel). 2021 Jan 7;21(2):365. doi: 10.3390/s21020365.
3
Efficient Retrieval of Massive Ocean Remote Sensing Images via a Cloud-Based Mean-Shift Algorithm.
基于云的均值漂移算法实现海量海洋遥感图像的高效检索
Sensors (Basel). 2017 Jul 23;17(7):1693. doi: 10.3390/s17071693.
4
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.Hadoop-GIS:一种基于MapReduce的高性能空间数据仓库系统。
Proceedings VLDB Endowment. 2013 Aug;6(11).