Institut de Química Avançada de Catalunya (IQAC-CSIC), 08034 Barcelona, Spain.
Department of Mathematics and Informatics, Universitat de Barcelona, 08007 Barcelona, Spain.
Int J Mol Sci. 2022 Sep 27;23(19):11408. doi: 10.3390/ijms231911408.
X-ray crystallography is a powerful method that has significantly contributed to our understanding of the biological function of proteins and other molecules. This method relies on the production of crystals that, however, are usually a bottleneck in the process. For some molecules, no crystallization has been achieved or insufficient crystals were obtained. Some other systems do not crystallize at all, such as nanoparticles which, because of their dimensions, cannot be treated by the usual crystallographic methods. To solve this, whole pair distribution function has been proposed to bridge the gap between Bragg and Debye scattering theories. To execute a fitting, the spectra of several different constructs, composed of millions of particles each, should be computed using a particle-pair or particle-particle (pp) distance algorithm. Using this computation as a test bench for current field-programmable gate array (FPGA) technology, we evaluate how the parallel computation capability of FPGAs can be exploited to reduce the computation time. We present two different solutions to the problem using two state-of-the-art FPGA technologies. In the first one, the main C program uses OmpSs (a high-level programming model developed at the Barcelona Supercomputing Center, that enables task offload to different high-performance computing devices) for task invocation, and kernels are built with OpenCL using reduced data sizes to save transmission time. The second approach uses task and data parallelism to operate on data locally and update data globally in a decoupled task. Benchmarks have been evaluated over an Intel D5005 Programmable Acceleration Card, computing a model of 2 million particles in 81.57 s - 24.5 billion atom pairs per second (bapps)- and over a ZU102 in 115.31 s. In our last test, over an up-to-date Alveo U200 board, the computation lasted for 34.68 s (57.67 bapps). In this study, we analyze the results in relation to the classic terms of speed-up and efficiency and give hints for future improvements focused on reducing the global job time.
X 射线晶体学是一种强大的方法,极大地促进了我们对蛋白质和其他分子的生物功能的理解。该方法依赖于晶体的产生,然而,晶体通常是该过程的瓶颈。对于某些分子,尚未实现结晶或获得的晶体不足。其他一些系统根本不结晶,例如纳米颗粒,由于其尺寸,不能用通常的晶体学方法处理。为了解决这个问题,已经提出了全对分布函数来弥合布拉格和德拜散射理论之间的差距。为了执行拟合,应该使用粒子对或粒子-粒子 (pp) 距离算法计算由数百万个粒子组成的几种不同结构的光谱。我们将这种计算用作当前现场可编程门阵列 (FPGA) 技术的测试基准,以评估如何利用 FPGA 的并行计算能力来缩短计算时间。我们使用两种最先进的 FPGA 技术提出了两种解决该问题的方法。在第一种方法中,主要 C 程序使用 OmpSs(巴塞罗那超级计算中心开发的高级编程模型,可将任务卸载到不同的高性能计算设备)进行任务调用,内核使用 OpenCL 构建,使用较小的数据大小以节省传输时间。第二种方法使用任务和数据并行性在本地操作数据,并在解耦任务中全局更新数据。基准测试在 Intel D5005 可编程加速卡上进行评估,计算了 200 万个粒子的模型,计算速度为每秒 245 亿原子对(bapps)-在 ZU102 上计算速度为 115.31 秒。在我们的最后一次测试中,在最新的 Alveo U200 板上,计算耗时 34.68 秒(57.67 bapps)。在这项研究中,我们根据经典的加速和效率术语分析结果,并为未来的改进提供了一些提示,重点是减少全局工作时间。