• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

位置Spark:内存中分布式空间查询处理与优化

LocationSpark: In-memory Distributed Spatial Query Processing and Optimization.

作者信息

Tang Mingjie, Yu Yongyang, Mahmood Ahmed R, Malluhi Qutaibah M, Ouzzani Mourad, Aref Walid G

机构信息

Chinese Academy of Science, Beijing, China.

Facebook, Menlo Park, CA, United States.

出版信息

Front Big Data. 2020 Oct 16;3:30. doi: 10.3389/fdata.2020.00030. eCollection 2020.

DOI:10.3389/fdata.2020.00030
PMID:33693403
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7931877/
Abstract

Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew that commonly happens in practice, and minimizes communication costs accordingly. We propose a distributed query scheduler that uses a new cost model to minimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler utilizes new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. Our prototype system is termed LocationSpark. The experimental study is based on real datasets and demonstrates that LocationSpark can enhance distributed spatial query processing by up to an order of magnitude over existing in-memory and distributed spatial systems.

摘要

由于空间数据应用的广泛存在以及这些应用生成和处理的大量空间数据,对可扩展的空间查询处理存在迫切需求。在本文中,我们提出了在内存和分布式环境中进行空间查询处理和优化的新技术,以解决可扩展性问题。更具体地说,我们引入了处理实际中常见的查询倾斜的新技术,并相应地最小化通信成本。我们提出了一种分布式查询调度器,它使用一种新的成本模型来最小化空间查询处理的成本。该调度器生成能最小化查询倾斜影响的查询执行计划。查询调度器利用基于位图过滤器的新空间索引技术将查询转发到适当的本地节点。每个本地计算节点负责根据该节点中的索引和空间查询的性质来优化和选择其最佳的本地查询执行计划。所有提出的空间查询处理和优化技术都在Spark(一个基于分布式内存的计算系统)中进行了原型实现。我们的原型系统称为LocationSpark。实验研究基于真实数据集,结果表明LocationSpark在分布式空间查询处理方面比现有的内存和分布式空间系统能提高多达一个数量级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/96092fb290dc/fdata-03-00030-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/5c8353a8390e/fdata-03-00030-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/808f8cb41b52/fdata-03-00030-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/63a42a697785/fdata-03-00030-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/909b5e7652fa/fdata-03-00030-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/784bd0335b29/fdata-03-00030-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/bef7a7735a75/fdata-03-00030-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/79a068685c27/fdata-03-00030-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/96092fb290dc/fdata-03-00030-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/5c8353a8390e/fdata-03-00030-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/808f8cb41b52/fdata-03-00030-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/63a42a697785/fdata-03-00030-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/909b5e7652fa/fdata-03-00030-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/784bd0335b29/fdata-03-00030-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/bef7a7735a75/fdata-03-00030-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/79a068685c27/fdata-03-00030-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8431/7931877/96092fb290dc/fdata-03-00030-g0008.jpg

相似文献

1
LocationSpark: In-memory Distributed Spatial Query Processing and Optimization.位置Spark:内存中分布式空间查询处理与优化
Front Big Data. 2020 Oct 16;3:30. doi: 10.3389/fdata.2020.00030. eCollection 2020.
2
SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing.SparkGIS:资源感知型高效内存空间查询处理
Proc ACM SIGSPATIAL Int Conf Adv Inf. 2017 Nov;2017.
3
A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method.一种使用MapReduce框架和基于语义的聚类方法进行并行查询优化的技术。
PeerJ Comput Sci. 2021 Jun 1;7:e580. doi: 10.7717/peerj-cs.580. eCollection 2021.
4
iSPEED: a Scalable and Distributed In-Memory Based Spatial Query System for Large and Structurally Complex 3D Data.iSPEED:一种用于大型且结构复杂的3D数据的可扩展分布式内存空间查询系统。
Proceedings VLDB Endowment. 2018 Aug;11(12):2078-2081. doi: 10.14778/3229863.3236264.
5
Quadrant-Based Minimum Bounding Rectangle-Tree Indexing Method for Similarity Queries over Big Spatial Data in HBase.基于象限的最小包围矩形树索引方法在 HBase 中用于大空间数据的相似性查询。
Sensors (Basel). 2018 Sep 10;18(9):3032. doi: 10.3390/s18093032.
6
A PID-Based kNN Query Processing Algorithm for Spatial Data.一种基于PID的空间数据kNN查询处理算法
Sensors (Basel). 2022 Oct 9;22(19):7651. doi: 10.3390/s22197651.
7
Scalable 3D Spatial Queries for Analytical Pathology Imaging with MapReduce.用于分析病理学成像的可扩展3D空间查询与MapReduce技术
Proc ACM SIGSPATIAL Int Conf Adv Inf. 2016 Oct-Nov;2016. doi: 10.1145/2996913.2996925.
8
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.迈向构建用于大规模医学影像数据的高性能空间查询系统
Proc ACM SIGSPATIAL Int Conf Adv Inf. 2012 Nov 6;2012:309-318. doi: 10.1145/2424321.2424361.
9
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.Hadoop-GIS:一种基于MapReduce的高性能空间数据仓库系统。
Proceedings VLDB Endowment. 2013 Aug;6(11).
10
iSPEED: an Efficient In-Memory Based Spatial Query System for Large-Scale 3D Data with Complex Structures.iSPEED:一种用于具有复杂结构的大规模3D数据的高效基于内存的空间查询系统。
Proc ACM SIGSPATIAL Int Conf Adv Inf. 2017 Nov;2017. doi: 10.1145/3139958.3139961.

引用本文的文献

1
A PID-Based kNN Query Processing Algorithm for Spatial Data.一种基于PID的空间数据kNN查询处理算法
Sensors (Basel). 2022 Oct 9;22(19):7651. doi: 10.3390/s22197651.

本文引用的文献

1
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.Hadoop-GIS:一种基于MapReduce的高性能空间数据仓库系统。
Proceedings VLDB Endowment. 2013 Aug;6(11).