Kang Donghe, Rübel Oliver, Byna Suren, Blanas Spyros
The Ohio State University.
Lawrence Berkeley National Laboratory.
Proc IPDPS (Conf). 2020 May;2020:906-915. doi: 10.1109/ipdps47924.2020.00097. Epub 2020 Jul 14.
Many applications are increasingly becoming I/O-bound. To improve scalability, analytical models of parallel I/O performance are often consulted to determine possible I/O optimizations. However, I/O performance modeling has predominantly focused on applications that directly issue I/O requests to a parallel file system or a local storage device. These I/O models are not directly usable by applications that access data through standardized I/O libraries, such as HDF5, FITS, and NetCDF, because a single I/O request to an object can trigger a cascade of I/O operations to different storage blocks. The I/O performance characteristics of applications that rely on these libraries is a complex function of the underlying data storage model, user-configurable parameters and object-level access patterns. As a consequence, I/O optimization is predominantly an ad-hoc process that is performed by application developers, who are often domain scientists with limited desire to delve into nuances of the storage hierarchy of modern computers. This paper presents an analytical cost model to predict the end-to-end execution time of applications that perform I/O through established array management libraries. The paper focuses on the HDF5 and Zarr array libraries, as examples of I/O libraries with radically different storage models: HDF5 stores every object in one file, while Zarr creates multiple files to store different objects. We find that accessing array objects via these I/O libraries introduces new overheads and optimizations. Specifically, in addition to I/O time, it is crucial to model the cost of transforming data to a particular storage layout (memory copy cost), as well as model the benefit of accessing a software cache. We evaluate the model on real applications that process observations (neuroscience) and simulation results (plasma physics). The evaluation on three HPC clusters reveals that I/O accounts for as little as 10% of the execution time in some cases, and hence models that only focus on I/O performance cannot accurately capture the performance of applications that use standard array storage libraries. In parallel experiments, our model correctly predicts the fastest storage library between HDF5 and Zarr 94% of the time, in contrast with 70% of the time for a cutting-edge I/O model.
许多应用程序越来越受I/O限制。为了提高可扩展性,人们常常参考并行I/O性能的分析模型来确定可能的I/O优化措施。然而,I/O性能建模主要集中在那些直接向并行文件系统或本地存储设备发出I/O请求的应用程序上。这些I/O模型不能被通过标准化I/O库(如HDF5、FITS和NetCDF)访问数据的应用程序直接使用,因为对一个对象的单个I/O请求可能会触发一系列到不同存储块的I/O操作。依赖这些库的应用程序的I/O性能特征是基础数据存储模型、用户可配置参数和对象级访问模式的复杂函数。因此,I/O优化主要是一个由应用程序开发人员执行的临时过程,而这些开发人员通常是领域科学家,他们对深入研究现代计算机存储层次结构的细微差别兴趣有限。本文提出了一个分析成本模型,用于预测通过既定数组管理库执行I/O的应用程序的端到端执行时间。本文重点关注HDF5和Zarr数组库,作为具有截然不同存储模型的I/O库的示例:HDF5将每个对象存储在一个文件中,而Zarr创建多个文件来存储不同的对象。我们发现,通过这些I/O库访问数组对象会引入新的开销和优化。具体而言,除了I/O时间外,对将数据转换为特定存储布局的成本(内存复制成本)进行建模以及对访问软件缓存的好处进行建模也至关重要。我们在处理观测数据(神经科学)和模拟结果(等离子体物理学)的实际应用程序上评估了该模型。在三个HPC集群上的评估表明,在某些情况下,I/O占执行时间的比例低至10%,因此仅关注I/O性能的模型无法准确捕捉使用标准数组存储库的应用程序的性能。在并行实验中,我们的模型在94%的时间内正确预测了HDF5和Zarr之间最快的存储库,相比之下,一个前沿I/O模型的预测准确率为70%。