• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无监督基于关键绩效指标的 HPC 数据中心作业聚类。

Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers.

机构信息

Business Information Systems Department, Arab Academy for Science Technology and Maritime Transport, Cairo 11799, Egypt.

Information & Computing Lab, AtlanTTIC Research Center, Universidade de Vigo, 36310 Vigo, Spain.

出版信息

Sensors (Basel). 2020 Jul 23;20(15):4111. doi: 10.3390/s20154111.

DOI:10.3390/s20154111
PMID:32718093
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7435729/
Abstract

Performance analysis is an essential task in high-performance computing (HPC) systems, and it is applied for different purposes, such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of key performance indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network (interface) traffic, or other sensors that monitor the hardware. Analyzing this data, it is possible to obtain insightful information about running jobs, such as their characteristics, performance, and failures. The main contribution in this paper was to identify which metric/s (KPIs) is/are the most appropriate to identify/classify different types of jobs according to their behavior in the HPC system. With this aim, we had applied different clustering techniques (partition and hierarchical clustering algorithms) using a real dataset from the Galician computation center (CESGA). We concluded that (i) those metrics (KPIs) related to the network (interface) traffic monitoring provided the best cohesion and separation to cluster HPC jobs, and (ii) hierarchical clustering algorithms were the most suitable for this task. Our approach was validated using a different real dataset from the same HPC center.

摘要

性能分析是高性能计算 (HPC) 系统中的一项重要任务,它可应用于多种目的,如异常检测、最优资源分配和预算规划。HPC 监控任务会生成大量关键性能指标 (KPI) 来监控系统中运行作业的状态。KPI 提供有关 CPU 使用情况、内存使用情况、网络(接口)流量或其他监控硬件的传感器的数据。分析这些数据,可以获得有关正在运行的作业的有见地的信息,例如它们的特征、性能和故障。本文的主要贡献是根据作业在 HPC 系统中的行为,确定哪些指标 (KPI) 最适合识别/分类不同类型的作业。为此,我们使用加利西亚计算中心 (CESGA) 的真实数据集应用了不同的聚类技术(分区和层次聚类算法)。我们得出结论:(i) 与网络(接口)流量监控相关的那些指标 (KPI) 提供了聚类 HPC 作业的最佳内聚性和分离性,以及 (ii) 层次聚类算法最适合此任务。我们的方法使用来自同一 HPC 中心的不同真实数据集进行了验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/bb22ac40aeb3/sensors-20-04111-g006a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/5bc693a9f275/sensors-20-04111-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/7d2c43b551b5/sensors-20-04111-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/a152dfea1aa4/sensors-20-04111-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/d26e1f3a0a00/sensors-20-04111-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/2489543fdda3/sensors-20-04111-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/bb22ac40aeb3/sensors-20-04111-g006a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/5bc693a9f275/sensors-20-04111-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/7d2c43b551b5/sensors-20-04111-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/a152dfea1aa4/sensors-20-04111-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/d26e1f3a0a00/sensors-20-04111-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/2489543fdda3/sensors-20-04111-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8d6/7435729/bb22ac40aeb3/sensors-20-04111-g006a.jpg

相似文献

1
Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers.无监督基于关键绩效指标的 HPC 数据中心作业聚类。
Sensors (Basel). 2020 Jul 23;20(15):4111. doi: 10.3390/s20154111.
2
Improving HPC System Performance by Predicting Job Resources via Supervised Machine Learning.通过监督式机器学习预测作业资源来提高高性能计算(HPC)系统性能
PEARC19 (2019). 2019 Jul;2019. doi: 10.1145/3332186.3333041. Epub 2019 Jul 28.
3
AMPRO-HPCC: A Machine-Learning Tool for Predicting Resources on Slurm HPC Clusters.AMPRO-HPCC:一种用于预测Slurm高性能计算集群资源的机器学习工具。
ADVCOMP Int Conf Adv Eng Comput Appl Sci. 2021 Oct;2021:20-27.
4
HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences.HPC-CLUST:用于大型核苷酸序列集的分布式层次聚类。
Bioinformatics. 2014 Jan 15;30(2):287-8. doi: 10.1093/bioinformatics/btt657. Epub 2013 Nov 9.
5
Scaling bioinformatics applications on HPC.生物信息学应用在高性能计算上的扩展。
BMC Bioinformatics. 2017 Dec 28;18(Suppl 14):501. doi: 10.1186/s12859-017-1902-7.
6
Feature Selection for Learning to Predict Outcomes of Compute Cluster Jobs with Application to Decision Support.用于学习预测计算集群作业结果并应用于决策支持的特征选择
Proc (Int Conf Comput Sci Comput Intell). 2020 Dec;2020:1231-1236. doi: 10.1109/csci51800.2020.00230.
7
Ensemble Prediction of Job Resources to Improve System Performance for Slurm-Based HPC Systems.用于基于Slurm的高性能计算系统以提高系统性能的作业资源集成预测
Pract Exp Adv Res Comput (2021). 2021 Jul;2021. doi: 10.1145/3437359.3465574. Epub 2021 Jul 17.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks.HPCGCN:一种使用图卷积网络的高性能计算集群日志数据预测框架。
Proc IEEE Int Conf Big Data. 2021 Dec;2021:4113-4118. doi: 10.1109/bigdata52589.2021.9671370. Epub 2022 Jan 13.
10
LigandScout Remote: A New User-Friendly Interface for HPC and Cloud Resources. LigandScout Remote:一个适用于高性能计算和云资源的全新用户友好界面。
J Chem Inf Model. 2019 Jan 28;59(1):31-37. doi: 10.1021/acs.jcim.8b00716. Epub 2018 Dec 27.

本文引用的文献

1
Statistical power for cluster analysis.聚类分析的统计功效。
BMC Bioinformatics. 2022 May 31;23(1):205. doi: 10.1186/s12859-022-04675-1.
2
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
3
A cluster separation measure.一种聚类分离度量。
IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7.
4
What is principal component analysis?什么是主成分分析?
Nat Biotechnol. 2008 Mar;26(3):303-4. doi: 10.1038/nbt0308-303.