Uppsala Multidisciplinary Center for Advanced Computational Methods (UPPMAX) and Department of Materials Chemistry, Uppsala University, Box 538, SE-751 21 Uppsala, Sweden.
J Mol Model. 2011 Oct;17(10):2669-85. doi: 10.1007/s00894-010-0948-5. Epub 2011 Jan 26.
We present general algorithms for the compression of molecular dynamics trajectories. The standard ways to store MD trajectories as text or as raw binary floating point numbers result in very large files when efficient simulation programs are used on supercomputers. Our algorithms are based on the observation that differences in atomic coordinates/velocities, in either time or space, are generally smaller than the absolute values of the coordinates/velocities. Also, it is often possible to store values at a lower precision. We apply several compression schemes to compress the resulting differences further. The most efficient algorithms developed here use a block sorting algorithm in combination with Huffman coding. Depending on the frequency of storage of frames in the trajectory, either space, time, or combinations of space and time differences are usually the most efficient. We compare the efficiency of our algorithms with each other and with other algorithms present in the literature for various systems: liquid argon, water, a virus capsid solvated in 15 mM aqueous NaCl, and solid magnesium oxide. We perform tests to determine how much precision is necessary to obtain accurate structural and dynamic properties, as well as benchmark a parallelized implementation of the algorithms. We obtain compression ratios (compared to single precision floating point) of 1:3.3-1:35 depending on the frequency of storage of frames and the system studied.
我们提出了用于压缩分子动力学轨迹的通用算法。当在超级计算机上使用高效的模拟程序时,将 MD 轨迹存储为文本或原始二进制浮点数会导致文件非常大。我们的算法基于以下观察结果:原子坐标/速度的差异,无论是在时间上还是空间上,通常都小于坐标/速度的绝对值。此外,通常可以以较低的精度存储值。我们应用了几种压缩方案来进一步压缩得到的差异。这里开发的最有效的算法使用块排序算法与哈夫曼编码相结合。根据轨迹中帧的存储频率,空间、时间或空间和时间差异的组合通常是最有效的。我们比较了我们的算法彼此之间以及文献中其他算法的效率,用于各种系统:液体氩、水、在 15mM 水溶液中溶解的病毒衣壳和固体氧化镁。我们进行了测试,以确定获得准确结构和动态特性所需的精度,以及对算法的并行实现进行基准测试。我们获得的压缩比(与单精度浮点相比)为 1:3.3-1:35,具体取决于帧的存储频率和研究的系统。