在Spark上使用Datalog查询进行大数据分析。

Big Data Analytics with Datalog Queries on Spark.

作者信息

Shkapsky Alexander, Yang Mohan, Interlandi Matteo, Chiu Hsuan, Condie Tyson, Zaniolo Carlo

机构信息

University of California, Los Angeles.

出版信息

Proc ACM SIGMOD Int Conf Manag Data. 2016 Jun-Jul;2016:1135-1149. doi: 10.1145/2882903.2915229.

DOI:10.1145/2882903.2915229

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5470845/

Abstract

There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics.

摘要

人们对利用云计算平台提供的机会进行大规模分析有着浓厚兴趣。在这些平台中，Apache Spark在机器学习和图分析方面越来越受欢迎。在Spark中开发高效的复杂分析需要深入理解手头的算法以及Spark API或子系统API（例如，Spark SQL、GraphX）。我们的BigDatalog系统通过提供适合高效评估的复杂查询的简洁声明式规范来解决这个问题。为了实现这一目标，我们提出了编译和优化技术，以解决在Spark中有效支持递归这一重要问题。我们与其他最先进的大规模Datalog系统进行了实验比较，并验证了我们技术的有效性以及Spark在支持基于Datalog的分析方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6018/5470845/f60f841ba2fc/nihms863451f1.jpg

相似文献

1

Big Data Analytics with Datalog Queries on Spark.在Spark上使用Datalog查询进行大数据分析。

Proc ACM SIGMOD Int Conf Manag Data. 2016 Jun-Jul;2016:1135-1149. doi: 10.1145/2882903.2915229.

2

An adaptive spark-based framework for querying large-scale NoSQL and relational databases.一种适用于查询大规模 NoSQL 和关系型数据库的基于火花的自适应框架。

PLoS One. 2021 Aug 19;16(8):e0255562. doi: 10.1371/journal.pone.0255562. eCollection 2021.

3

Bioinformatics applications on Apache Spark.基于 Apache Spark 的生物信息学应用。

Gigascience. 2018 Aug 1;7(8):giy098. doi: 10.1093/gigascience/giy098.

4

Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.医学成像中的大数据处理：基于大规模自动并行计算的迭代重建

IEEE Nucl Sci Symp Conf Rec (1997). 2014 Nov;2014. doi: 10.1109/NSSMIC.2014.7430758.

5

Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project.使用Apache Spark和GPU处理分析大规模功能磁共振成像数据的大数据方法：来自人类连接体项目静息态功能磁共振成像数据的演示

Front Neurosci. 2016 Jan 6;9:492. doi: 10.3389/fnins.2015.00492. eCollection 2015.

6

DHPV: a distributed algorithm for large-scale graph partitioning.DHPV：一种用于大规模图分区的分布式算法。

J Big Data. 2020;7(1):76. doi: 10.1186/s40537-020-00357-y. Epub 2020 Sep 16.

7

Using Apache Spark on genome assembly for scalable overlap-graph reduction.利用 Apache Spark 进行基因组组装的可扩展重叠图缩减。

Hum Genomics. 2019 Oct 22;13(Suppl 1):48. doi: 10.1186/s40246-019-0227-1.

8

Framing Apache Spark in life sciences.从生命科学角度构建Apache Spark

Heliyon. 2023 Feb 9;9(2):e13368. doi: 10.1016/j.heliyon.2023.e13368. eCollection 2023 Feb.

9

On-the-Fly Fusion of Remotely-Sensed Big Data Using an Elastic Computing Paradigm with a Containerized Spark Engine on Kubernetes.在Kubernetes上使用带有容器化Spark引擎的弹性计算范式对遥感大数据进行实时融合。

Sensors (Basel). 2021 Apr 23;21(9):2971. doi: 10.3390/s21092971.

10

A Distributed Computing Platform for fMRI Big Data Analytics.用于功能磁共振成像大数据分析的分布式计算平台。

IEEE Trans Big Data. 2019 Jun;5(2):109-119. doi: 10.1109/TBDATA.2018.2811508. Epub 2018 Mar 6.

引用本文的文献

1

Optimizing Interactive Development of Data-Intensive Applications.优化数据密集型应用程序的交互式开发。

Proc ACM Symp Cloud Comput. 2016 Oct;2016:510-522. doi: 10.1145/2987550.2987565.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验