• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于 Spark 配置参数优化的新型强化学习方法。

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization.

机构信息

School of Cyber Security and Computer, Hebei University, Baoding 071000, China.

出版信息

Sensors (Basel). 2022 Aug 8;22(15):5930. doi: 10.3390/s22155930.

DOI:10.3390/s22155930
PMID:35957487
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9371413/
Abstract

Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent's poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications.

摘要

Apache Spark 是一个流行的开源分布式数据处理框架,可以高效地处理大量数据。它为用户提供了超过 180 个配置参数,用户可以根据自己的经验手动选择合适的参数值。但是,由于参数数量众多且它们之间存在固有相关性,手动调整非常繁琐。为了解决通过个人经验进行调整的问题,我们设计并实现了一个基于强化学习的 Spark 配置参数优化器。首先,我们使用深度神经网络训练了一个 Spark 应用程序性能预测模型,并从多个角度验证了模型的准确性和有效性。其次,为了提高更好配置参数的搜索效率,我们改进了 Q-learning 算法,并在每次训练迭代中自动设置起始和结束状态,这有效地提高了代理在探索更好配置参数方面的性能不佳问题。最后,将我们提出的配置与默认配置作为基准进行比较,实验结果表明,优化后的配置在四种不同类型的 Spark 应用程序中平均性能提升了 47%、43%、31%和 45%,这表明我们的 Spark 配置参数优化器可以有效地找到更好的配置参数并提高各种 Spark 应用程序的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/dd5ef16c553e/sensors-22-05930-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/722761005fee/sensors-22-05930-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/daefe67edbd1/sensors-22-05930-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/280dd1aa532b/sensors-22-05930-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/74efd61ce143/sensors-22-05930-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/dd5ef16c553e/sensors-22-05930-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/722761005fee/sensors-22-05930-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/daefe67edbd1/sensors-22-05930-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/280dd1aa532b/sensors-22-05930-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/74efd61ce143/sensors-22-05930-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6461/9371413/dd5ef16c553e/sensors-22-05930-g005.jpg

相似文献

1
A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization.一种用于 Spark 配置参数优化的新型强化学习方法。
Sensors (Basel). 2022 Aug 8;22(15):5930. doi: 10.3390/s22155930.
2
MonkeyKing: Adaptive Parameter Tuning on Big Data Platforms with Deep Reinforcement Learning.孙悟空:基于深度强化学习的大数据平台自适应参数调整。
Big Data. 2020 Aug;8(4):270-290. doi: 10.1089/big.2019.0123. Epub 2020 Jul 10.
3
An innovative parameter optimization of Spark Streaming based on D3QN with Gaussian process regression.
Math Biosci Eng. 2023 Jul 3;20(8):14464-14486. doi: 10.3934/mbe.2023647.
4
A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark.一种基于Apache Spark的并行多目标粒子群加权平均聚类算法。
Entropy (Basel). 2023 Jan 31;25(2):259. doi: 10.3390/e25020259.
5
A Novel Oppositional Chaotic Flower Pollination Optimization Algorithm for Automatic Tuning of Hadoop Configuration Parameters.一种新颖的对抗混沌花授粉优化算法,用于自动调整 Hadoop 配置参数。
Big Data. 2020 Jun;8(3):218-234. doi: 10.1089/big.2019.0111. Epub 2020 May 19.
6
Intelligent Parameter Tuning in Optimization-Based Iterative CT Reconstruction via Deep Reinforcement Learning.基于深度强化学习的优化迭代 CT 重建中的智能参数调整。
IEEE Trans Med Imaging. 2018 Jun;37(6):1430-1439. doi: 10.1109/TMI.2018.2823679.
7
A CNN identified by reinforcement learning-based optimization framework for EEG-based state evaluation.基于强化学习优化框架的 CNN 用于基于 EEG 的状态评估。
J Neural Eng. 2021 May 18;18(4). doi: 10.1088/1741-2552/abfa71.
8
Experienced Gray Wolf Optimization Through Reinforcement Learning and Neural Networks.基于强化学习和神经网络的经验灰狼优化算法。
IEEE Trans Neural Netw Learn Syst. 2018 Mar;29(3):681-694. doi: 10.1109/TNNLS.2016.2634548. Epub 2017 Jan 10.
9
Large-scale digital forensic investigation for Windows registry on Apache Spark.基于 Apache Spark 的 Windows 注册表大规模数字取证调查。
PLoS One. 2022 Dec 7;17(12):e0267411. doi: 10.1371/journal.pone.0267411. eCollection 2022.
10
EMONAS-Net: Efficient multiobjective neural architecture search using surrogate-assisted evolutionary algorithm for 3D medical image segmentation.EMONAS-Net:基于代理辅助进化算法的高效多目标神经架构搜索在 3D 医学图像分割中的应用。
Artif Intell Med. 2021 Sep;119:102154. doi: 10.1016/j.artmed.2021.102154. Epub 2021 Aug 24.

本文引用的文献

1
BlockQNN: Efficient Block-Wise Neural Network Architecture Generation.BlockQNN:高效的分块神经网络架构生成。
IEEE Trans Pattern Anal Mach Intell. 2021 Jul;43(7):2314-2328. doi: 10.1109/TPAMI.2020.2969193. Epub 2021 Jun 8.