• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过监督式机器学习预测作业资源来提高高性能计算(HPC)系统性能

Improving HPC System Performance by Predicting Job Resources via Supervised Machine Learning.

作者信息

Tanash Mohammed, Dunn Brandon, Andresen Daniel, Hsu William, Yang Huichen, Okanlawon Adedolapo

机构信息

Kansas State University, Manhattan, Kansas.

出版信息

PEARC19 (2019). 2019 Jul;2019. doi: 10.1145/3332186.3333041. Epub 2019 Jul 28.

DOI:10.1145/3332186.3333041
PMID:35308798
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8932944/
Abstract

High-Performance Computing (HPC) systems are resources utilized for data capture, sharing, and analysis. The majority of our HPC users come from other disciplines than Computer Science. HPC users including computer scientists have difficulties and do not feel proficient enough to decide the required amount of resources for their submitted jobs on the cluster. Consequently, users are encouraged to over-estimate resources for their submitted jobs, so their jobs will not be killing due insufficient resources. This process will waste and devour HPC resources; hence, this will lead to inefficient cluster utilization. We created a supervised machine learning model and integrated it into the Slurm resource manager simulator to predict the amount of required memory resources (Memory) and the required amount of time to run the computation. Our model involves using different machine learning algorithms. Our goal is to integrate and test the proposed supervised machine learning model on Slurm. We used over 10000 tasks selected from our HPC log files to evaluate the performance and the accuracy of our integrated model. The purpose of our work is to increase the performance of the Slurm by predicting the amount of require jobs memory resources and the time required for each particular job in order to improve the utilization of the HPC system using our integrated supervised machine learning model. Our results indicate that for larger jobs our model helps dramatically reduce computational turnaround time (from five days to ten hours for large jobs), substantially increased utilization of the HPC system, and decreased the average waiting time for the submitted jobs.

摘要

高性能计算(HPC)系统是用于数据采集、共享和分析的资源。我们的大多数HPC用户并非来自计算机科学领域。包括计算机科学家在内的HPC用户在为其在集群上提交的作业确定所需资源量时存在困难,并且感觉自己不够熟练。因此,鼓励用户对其提交的作业高估资源量,这样他们的作业就不会因资源不足而被终止。这个过程会浪费并消耗HPC资源;因此,这将导致集群利用效率低下。我们创建了一个监督式机器学习模型,并将其集成到Slurm资源管理器模拟器中,以预测所需的内存资源量(内存)和运行计算所需的时间量。我们的模型涉及使用不同的机器学习算法。我们的目标是在Slurm上集成并测试所提出的监督式机器学习模型。我们使用从HPC日志文件中选取的10000多个任务来评估我们集成模型的性能和准确性。我们工作的目的是通过预测所需作业内存资源量和每个特定作业所需的时间来提高Slurm的性能,以便使用我们集成的监督式机器学习模型来提高HPC系统的利用率。我们的结果表明,对于大型作业,我们的模型有助于大幅减少计算周转时间(大型作业从五天减少到十小时),大幅提高HPC系统的利用率,并减少提交作业的平均等待时间。

相似文献

1
Improving HPC System Performance by Predicting Job Resources via Supervised Machine Learning.通过监督式机器学习预测作业资源来提高高性能计算(HPC)系统性能
PEARC19 (2019). 2019 Jul;2019. doi: 10.1145/3332186.3333041. Epub 2019 Jul 28.
2
Ensemble Prediction of Job Resources to Improve System Performance for Slurm-Based HPC Systems.用于基于Slurm的高性能计算系统以提高系统性能的作业资源集成预测
Pract Exp Adv Res Comput (2021). 2021 Jul;2021. doi: 10.1145/3437359.3465574. Epub 2021 Jul 17.
3
AMPRO-HPCC: A Machine-Learning Tool for Predicting Resources on Slurm HPC Clusters.AMPRO-HPCC:一种用于预测Slurm高性能计算集群资源的机器学习工具。
ADVCOMP Int Conf Adv Eng Comput Appl Sci. 2021 Oct;2021:20-27.
4
Feature Selection for Learning to Predict Outcomes of Compute Cluster Jobs with Application to Decision Support.用于学习预测计算集群作业结果并应用于决策支持的特征选择
Proc (Int Conf Comput Sci Comput Intell). 2020 Dec;2020:1231-1236. doi: 10.1109/csci51800.2020.00230.
5
HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks.HPCGCN:一种使用图卷积网络的高性能计算集群日志数据预测框架。
Proc IEEE Int Conf Big Data. 2021 Dec;2021:4113-4118. doi: 10.1109/bigdata52589.2021.9671370. Epub 2022 Jan 13.
6
LigandScout Remote: A New User-Friendly Interface for HPC and Cloud Resources. LigandScout Remote:一个适用于高性能计算和云资源的全新用户友好界面。
J Chem Inf Model. 2019 Jan 28;59(1):31-37. doi: 10.1021/acs.jcim.8b00716. Epub 2018 Dec 27.
7
Developing an efficient scheduling template of a chemotherapy treatment unit: A case study.制定化疗治疗单元的高效排班模板:一项案例研究。
Australas Med J. 2011;4(10):575-88. doi: 10.4066/AMJ.2011.837. Epub 2011 Oct 31.
8
Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers.无监督基于关键绩效指标的 HPC 数据中心作业聚类。
Sensors (Basel). 2020 Jul 23;20(15):4111. doi: 10.3390/s20154111.
9
Resource estimation in high performance medical image computing.高性能医学图像计算中的资源估计
Neuroinformatics. 2014 Oct;12(4):563-73. doi: 10.1007/s12021-014-9234-5.
10
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.JMS:一个用于高性能计算的开源工作流管理系统和基于网络的集群前端。
PLoS One. 2015 Aug 17;10(8):e0134273. doi: 10.1371/journal.pone.0134273. eCollection 2015.

引用本文的文献

1
PhytoOracle: Scalable, modular phenomics data processing pipelines.植物表型组学数据库:可扩展的模块化植物表型组学数据处理管道。
Front Plant Sci. 2023 Mar 6;14:1112973. doi: 10.3389/fpls.2023.1112973. eCollection 2023.
2
AMPRO-HPCC: A Machine-Learning Tool for Predicting Resources on Slurm HPC Clusters.AMPRO-HPCC:一种用于预测Slurm高性能计算集群资源的机器学习工具。
ADVCOMP Int Conf Adv Eng Comput Appl Sci. 2021 Oct;2021:20-27.
3
Ensemble Prediction of Job Resources to Improve System Performance for Slurm-Based HPC Systems.用于基于Slurm的高性能计算系统以提高系统性能的作业资源集成预测
Pract Exp Adv Res Comput (2021). 2021 Jul;2021. doi: 10.1145/3437359.3465574. Epub 2021 Jul 17.