Düben Peter D, Russell Francis P, Niu Xinyu, Luk Wayne, Palmer T N
AOPP, Department of Physics University of Oxford Oxford UK.
Department of Computing Imperial College London London UK.
J Adv Model Earth Syst. 2015 Sep;7(3):1393-1408. doi: 10.1002/2015MS000494. Epub 2015 Sep 18.
Programmable hardware, in particular Field Programmable Gate Arrays (FPGAs), promises a significant increase in computational performance for simulations in geophysical fluid dynamics compared with CPUs of similar power consumption. FPGAs allow adjusting the representation of floating-point numbers to specific application needs. We analyze the performance-precision trade-off on FPGA hardware for the two-scale Lorenz '95 model. We scale the size of this toy model to that of a high-performance computing application in order to make meaningful performance tests. We identify the minimal level of precision at which changes in model results are not significant compared with a maximal precision version of the model and find that this level is very similar for cases where the model is integrated for very short or long intervals. It is therefore a useful approach to investigate model errors due to rounding errors for very short simulations (e.g., 50 time steps) to obtain a range for the level of precision that can be used in expensive long-term simulations. We also show that an approach to reduce precision with increasing forecast time, when model errors are already accumulated, is very promising. We show that a speed-up of 1.9 times is possible in comparison to FPGA simulations in single precision if precision is reduced with no strong change in model error. The single-precision FPGA setup shows a speed-up of 2.8 times in comparison to our model implementation on two 6-core CPUs for large model setups.
可编程硬件,特别是现场可编程门阵列(FPGA),与功耗相似的CPU相比,有望显著提高地球物理流体动力学模拟的计算性能。FPGA允许根据特定应用需求调整浮点数的表示方式。我们分析了两尺度Lorenz '95模型在FPGA硬件上的性能-精度权衡。我们将这个简单模型的规模扩大到高性能计算应用的规模,以便进行有意义的性能测试。我们确定了与模型的最大精度版本相比,模型结果变化不显著的最小精度水平,并发现对于模型在非常短或非常长的时间间隔内进行积分的情况,这个水平非常相似。因此,对于非常短的模拟(例如50个时间步),研究由于舍入误差导致的模型误差是一种有用的方法,以获得可用于昂贵的长期模拟的精度水平范围。我们还表明,当模型误差已经积累时,一种随着预测时间增加而降低精度的方法非常有前景。我们表明,如果在模型误差没有强烈变化的情况下降低精度,与单精度的FPGA模拟相比,加速比可达1.9倍。对于大型模型设置,单精度FPGA设置与我们在两个6核CPU上的模型实现相比,加速比为2.8倍。