SPEC CPU2006 和 SPEC CPU2017 在英特尔至强 Skylake-SP 上的内存层次结构特征。

Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP.

机构信息

Departamento de Informática e Ingeniería de Sistemas - Aragón Institute for Engineering Research (I3A), Universidad de Zaragoza, Zaragoza, Spain.

出版信息

PLoS One. 2019 Aug 1;14(8):e0220135. doi: 10.1371/journal.pone.0220135. eCollection 2019.

DOI:10.1371/journal.pone.0220135

PMID:31369592

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6675054/

Abstract

SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU2017 has recently been released to replace CPU2006. In this paper we present a detailed evaluation of the memory hierarchy performance for both the CPU2006 and single-threaded CPU2017 benchmarks. The experiments were executed on an Intel Xeon Skylake-SP, which is the first Intel processor to implement a mostly non-inclusive last-level cache (LLC). We present a classification of the benchmarks according to their memory pressure and analyze the performance impact of different LLC sizes. We also test all the hardware prefetchers showing they improve performance in most of the benchmarks. After comprehensive experimentation, we can highlight the following conclusions: i) almost half of SPEC CPU benchmarks have very low miss ratios in the second and third level caches, even with small LLC sizes and without hardware prefetching, ii) overall, the SPEC CPU2017 benchmarks demand even less memory hierarchy resources than the SPEC CPU2006 ones, iii) hardware prefetching is very effective in reducing LLC misses for most benchmarks, even with the smallest LLC size, and iv) from the memory hierarchy standpoint the methodologies commonly used to select benchmarks or simulation points do not guarantee representative workloads.

摘要

SPEC CPU 是计算机体系结构研究中最常用的基准套件之一。CPU2017 最近发布，以取代 CPU2006。在本文中，我们对 CPU2006 和单线程 CPU2017 基准测试的内存层次性能进行了详细评估。实验在英特尔至强 Skylake-SP 上执行，这是第一款实现大部分非包含性最后一级缓存（LLC）的英特尔处理器。我们根据内存压力对基准测试进行分类，并分析不同 LLC 大小对性能的影响。我们还测试了所有硬件预取器，发现它们在大多数基准测试中都能提高性能。经过全面的实验，我们可以得出以下结论：i）几乎一半的 SPEC CPU 基准测试在二级和三级缓存中的缺失率非常低，即使 LLC 尺寸较小且没有硬件预取，ii）总体而言，SPEC CPU2017 基准测试比 SPEC CPU2006 基准测试对内存层次资源的需求更少，iii）硬件预取对于大多数基准测试非常有效，可以减少 LLC 缺失，即使 LLC 尺寸最小，iv）从内存层次的角度来看，常用的选择基准测试或模拟点的方法并不能保证代表性的工作负载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/901b/6675054/69b177fb7c02/pone.0220135.g001.jpg

相似文献

Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP.SPEC CPU2006 和 SPEC CPU2017 在英特尔至强 Skylake-SP 上的内存层次结构特征。

PLoS One. 2019 Aug 1;14(8):e0220135. doi: 10.1371/journal.pone.0220135. eCollection 2019.

Correction: Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP.更正：英特尔至强铂金8100系列处理器上SPEC CPU2006和SPEC CPU2017的内存层次结构特性

PLoS One. 2024 May 9;19(5):e0303712. doi: 10.1371/journal.pone.0303712. eCollection 2024.

Challenges and opportunities for the simulation of calcium waves on modern multi-core and many-core parallel computing platforms.在现代多核和众核并行计算平台上模拟钙波所面临的挑战与机遇。

Int J Numer Method Biomed Eng. 2021 Nov;37(11):e3244. doi: 10.1002/cnm.3244. Epub 2019 Sep 2.

Combining instruction prefetching with partial cache locking to improve WCET in real-time systems.将指令预取与部分缓存锁定相结合以提高实时系统中的最坏情况执行时间。

PLoS One. 2013 Dec 26;8(12):e82975. doi: 10.1371/journal.pone.0082975. eCollection 2013.

Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor.基于FM索引并使用英特尔KNL处理器加速序列比对

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1093-1104. doi: 10.1109/TCBB.2018.2884701. Epub 2018 Dec 6.

Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures.基于隐马尔可夫模型的非易失性存储架构的片外预取。

PLoS One. 2021 Sep 14;16(9):e0257047. doi: 10.1371/journal.pone.0257047. eCollection 2021.

Analysing the performance of personal computers based on Intel microprocessors for sequence aligning bioinformatics applications.分析基于英特尔微处理器的个人计算机在生物信息学序列比对应用中的性能。

Int J Bioinform Res Appl. 2007;3(2):187-205. doi: 10.1504/IJBRA.2007.013602.

L2C2: Last-level compressed-contents non-volatile cache and a procedure to forecast performance and lifetime.L2C2：末级压缩内容非易失性高速缓存和一种预测性能及寿命的方法。

PLoS One. 2023 Feb 7;18(2):e0278346. doi: 10.1371/journal.pone.0278346. eCollection 2023.

Cooperative and out-of-core execution of the irregular wavefront propagation pattern on hybrid machines with Intel Xeon Phi™.在配备英特尔至强融核™的混合机上对不规则波前传播模式进行协同和核外执行。

Concurr Comput. 2018 Jul 25;30(14). doi: 10.1002/cpe.4425. Epub 2018 Jan 24.

Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.利用英特尔至强融核协处理器加快蛋白质功能注释的步伐。

IEEE Trans Nanobioscience. 2015 Jun;14(4):429-439. doi: 10.1109/TNB.2015.2403776. Epub 2015 Mar 5.

引用本文的文献

PLoS One. 2024 May 9;19(5):e0303712. doi: 10.1371/journal.pone.0303712. eCollection 2024.

L2C2: Last-level compressed-contents non-volatile cache and a procedure to forecast performance and lifetime.L2C2：末级压缩内容非易失性高速缓存和一种预测性能及寿命的方法。

PLoS One. 2023 Feb 7;18(2):e0278346. doi: 10.1371/journal.pone.0278346. eCollection 2023.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SPEC CPU2006 和 SPEC CPU2017 在英特尔至强 Skylake-SP 上的内存层次结构特征。

Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献