数据密集型管道的 Spark 流反向压力性能评估分析。

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines.

机构信息

Institute of Informatics, Federal University of Rio Grande do Sul, UFRGS/PPGC, Porto Alegre 91501-970, RS, Brazil.

LIG-ERODS, Université Grenoble Alpes, 38058 Grenoble, France.

出版信息

Sensors (Basel). 2022 Jun 23;22(13):4756. doi: 10.3390/s22134756.

DOI:10.3390/s22134756

PMID:35808249

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9269592/

Abstract

A significant rise in the adoption of streaming applications has changed the decision-making processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related in-memory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.

摘要

流媒体应用的广泛采用改变了过去十年中的决策过程。这一趋势催生了多种用于内存处理的大数据技术，如 Apache Storm、Spark、Heron、Samza、Flink 等系统。Spark Streaming 是一种广泛使用的开源实现，用于处理数据密集型应用程序，这些应用程序通常需要大量内存。然而，Spark 统一内存管理器无法妥善管理突发或密集型数据激增及其相关的内存缓存需求，从而导致性能和吞吐量下降、高延迟、大量垃圾收集操作、内存溢出问题和数据丢失。本工作对 Spark Streaming 的反向压力进行了全面的性能评估，以验证其在特定压力要求下支持数据密集型管道的假设。结果表明，反向压力仅适用于无状态和有状态应用程序的中小规模管道。此外，它还指出了导致数据密集型管道和有状态应用程序出现基于内存问题的 Spark Streaming 局限性。此外，本工作还提出了一些潜在的解决方案。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

数据密集型管道的 Spark 流反向压力性能评估分析。

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

数据密集型管道的 Spark 流反向压力性能评估分析。

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献