• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AMPRO-HPCC:一种用于预测Slurm高性能计算集群资源的机器学习工具。

AMPRO-HPCC: A Machine-Learning Tool for Predicting Resources on Slurm HPC Clusters.

作者信息

Tanash Mohammed, Andresen Daniel, Hsu William

机构信息

Computer Science Department, Kansas State University, Manhattan, United States.

出版信息

ADVCOMP Int Conf Adv Eng Comput Appl Sci. 2021 Oct;2021:20-27.

PMID:36760802
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9906793/
Abstract

Determining resource allocations (memory and time) for submitted jobs in High Performance Computing (HPC) systems is a challenging process even for computer scientists. HPC users are highly encouraged to overestimate resource allocation for their submitted jobs, so their jobs will not be killed due to insufficient resources. Overestimating resource allocations occurs because of the wide variety of HPC applications and environment configuration options, and the lack of knowledge of the complex structure of HPC systems. This causes a waste of HPC resources, a decreased utilization of HPC systems, and increased waiting and turnaround time for submitted jobs. In this paper, we introduce our first ever implemented fully-offline, fully-automated, stand-alone, and open-source Machine Learning (ML) tool to help users predict memory and time requirements for their submitted jobs on the cluster. Our tool involves implementing six ML discriminative models from the scikit-learn and Microsoft LightGBM applied on the historical data (sacct data) from Simple Linux Utility for Resource Management (Slurm). We have tested our tool using historical data (saact data) using HPC resources of Kansas State University (Beocat), which covers the years from January 2019 - March 2021, and contains around 17.6 million jobs. Our results show that our tool achieves high predictive accuracy (0.72 using LightGBM for predicting the memory and 0.74 using Random Forest for predicting the time), helps dramatically reduce computational average waiting-time and turnaround time for the submitted jobs, and increases utilization of the HPC resources. Hence, our tool decreases the power consumption of the HPC resources.

摘要

即使对于计算机科学家来说,在高性能计算(HPC)系统中为提交的作业确定资源分配(内存和时间)也是一个具有挑战性的过程。强烈鼓励HPC用户高估其提交作业的资源分配,这样他们的作业就不会因资源不足而被终止。高估资源分配的情况之所以会出现,是因为HPC应用程序和环境配置选项种类繁多,且缺乏对HPC系统复杂结构的了解。这导致了HPC资源的浪费、HPC系统利用率的降低,以及提交作业的等待时间和周转时间的增加。在本文中,我们介绍了我们首次实现的完全离线、全自动、独立且开源的机器学习(ML)工具,以帮助用户预测其在集群上提交作业的内存和时间需求。我们的工具涉及从scikit-learn和Microsoft LightGBM实现六个ML判别模型,并将其应用于来自简单Linux资源管理实用程序(Slurm)的历史数据(sacct数据)。我们使用堪萨斯州立大学(Beocat)的HPC资源,利用历史数据(saact数据)对我们的工具进行了测试,该数据涵盖了2019年1月至2021年3月,包含约1760万个作业。我们的结果表明,我们的工具实现了较高的预测准确率(使用LightGBM预测内存时为0.72,使用随机森林预测时间时为0.74),极大地帮助减少了提交作业的计算平均等待时间和周转时间,并提高了HPC资源的利用率。因此,我们的工具降低了HPC资源的功耗。

相似文献

1
AMPRO-HPCC: A Machine-Learning Tool for Predicting Resources on Slurm HPC Clusters.AMPRO-HPCC:一种用于预测Slurm高性能计算集群资源的机器学习工具。
ADVCOMP Int Conf Adv Eng Comput Appl Sci. 2021 Oct;2021:20-27.
2
Ensemble Prediction of Job Resources to Improve System Performance for Slurm-Based HPC Systems.用于基于Slurm的高性能计算系统以提高系统性能的作业资源集成预测
Pract Exp Adv Res Comput (2021). 2021 Jul;2021. doi: 10.1145/3437359.3465574. Epub 2021 Jul 17.
3
Improving HPC System Performance by Predicting Job Resources via Supervised Machine Learning.通过监督式机器学习预测作业资源来提高高性能计算(HPC)系统性能
PEARC19 (2019). 2019 Jul;2019. doi: 10.1145/3332186.3333041. Epub 2019 Jul 28.
4
Feature Selection for Learning to Predict Outcomes of Compute Cluster Jobs with Application to Decision Support.用于学习预测计算集群作业结果并应用于决策支持的特征选择
Proc (Int Conf Comput Sci Comput Intell). 2020 Dec;2020:1231-1236. doi: 10.1109/csci51800.2020.00230.
5
HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks.HPCGCN:一种使用图卷积网络的高性能计算集群日志数据预测框架。
Proc IEEE Int Conf Big Data. 2021 Dec;2021:4113-4118. doi: 10.1109/bigdata52589.2021.9671370. Epub 2022 Jan 13.
6
Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers.无监督基于关键绩效指标的 HPC 数据中心作业聚类。
Sensors (Basel). 2020 Jul 23;20(15):4111. doi: 10.3390/s20154111.
7
LigandScout Remote: A New User-Friendly Interface for HPC and Cloud Resources. LigandScout Remote:一个适用于高性能计算和云资源的全新用户友好界面。
J Chem Inf Model. 2019 Jan 28;59(1):31-37. doi: 10.1021/acs.jcim.8b00716. Epub 2018 Dec 27.
8
Developing an efficient scheduling template of a chemotherapy treatment unit: A case study.制定化疗治疗单元的高效排班模板:一项案例研究。
Australas Med J. 2011;4(10):575-88. doi: 10.4066/AMJ.2011.837. Epub 2011 Oct 31.
9
SPIM workflow manager for HPC.用于高性能计算的 SPIM 工作流管理器。
Bioinformatics. 2019 Oct 1;35(19):3875-3876. doi: 10.1093/bioinformatics/btz140.
10
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.JMS:一个用于高性能计算的开源工作流管理系统和基于网络的集群前端。
PLoS One. 2015 Aug 17;10(8):e0134273. doi: 10.1371/journal.pone.0134273. eCollection 2015.

本文引用的文献

1
Ensemble Prediction of Job Resources to Improve System Performance for Slurm-Based HPC Systems.用于基于Slurm的高性能计算系统以提高系统性能的作业资源集成预测
Pract Exp Adv Res Comput (2021). 2021 Jul;2021. doi: 10.1145/3437359.3465574. Epub 2021 Jul 17.
2
Improving HPC System Performance by Predicting Job Resources via Supervised Machine Learning.通过监督式机器学习预测作业资源来提高高性能计算(HPC)系统性能
PEARC19 (2019). 2019 Jul;2019. doi: 10.1145/3332186.3333041. Epub 2019 Jul 28.
3
Predicting runtimes of bioinformatics tools based on historical data: five years of Galaxy usage.
基于历史数据预测生物信息学工具的运行时间:Galaxy 使用的五年。
Bioinformatics. 2019 Sep 15;35(18):3453-3460. doi: 10.1093/bioinformatics/btz054.
4
The use of classification trees for bioinformatics.分类树在生物信息学中的应用。
Wiley Interdiscip Rev Data Min Knowl Discov. 2011 Jan;1(1):55-63. doi: 10.1002/widm.14. Epub 2011 Jan 6.