Lindstrom Peter, Isenburg Martin
Lawrence Livermore National Laboratory, USA.
IEEE Trans Vis Comput Graph. 2006 Sep-Oct;12(5):1245-50. doi: 10.1109/TVCG.2006.143.
Large scale scientific simulation codes typically run on a cluster of CPUs that write/read time steps to/from a single file system. As data sets are constantly growing in size, this increasingly leads to I/O bottlenecks. When the rate at which data is produced exceeds the available I/O bandwidth, the simulation stalls and the CPUs are idle. Data compression can alleviate this problem by using some CPU cycles to reduce the amount of data needed to be transfered. Most compression schemes, however, are designed to operate offline and seek to maximize compression, not throughput. Furthermore, they often require quantizing floating-point values onto a uniform integer grid, which disqualifies their use in applications where exact values must be retained. We propose a simple scheme for lossless, online compression of floating-point data that transparently integrates into the I/O of many applications. A plug-in scheme for data-dependent prediction makes our scheme applicable to a wide variety of data used in visualization, such as unstructured meshes, point sets, images, and voxel grids. We achieve state-of-the-art compression rates and speeds, the latter in part due to an improved entropy coder. We demonstrate that this significantly accelerates I/O throughput in real simulation runs. Unlike previous schemes, our method also adapts well to variable-precision floating-point and integer data.
大规模科学模拟代码通常在一组CPU上运行,这些CPU会向单个文件系统写入/读取时间步长。随着数据集规模不断扩大,这越来越多地导致I/O瓶颈。当数据生成速率超过可用的I/O带宽时,模拟就会停滞,CPU处于空闲状态。数据压缩可以通过使用一些CPU周期来减少需要传输的数据量来缓解这个问题。然而,大多数压缩方案设计为离线操作,并试图最大化压缩率,而非吞吐量。此外,它们通常需要将浮点值量化到统一的整数网格上,这使得它们无法用于必须保留精确值的应用中。我们提出了一种简单的方案,用于对浮点数据进行无损在线压缩,该方案可透明地集成到许多应用的I/O中。一种用于数据相关预测的插件方案使我们的方案适用于可视化中使用的各种数据,如非结构化网格、点集、图像和体素网格。我们实现了一流的压缩率和速度,速度提升部分归功于改进的熵编码器。我们证明,这在实际模拟运行中显著提高了I/O吞吐量。与以前的方案不同,我们的方法也能很好地适应可变精度的浮点和整数数据。