Data Science and Learning Division, Argonne National Laboratory, 9700 Cass Avenue, Lemont, IL, 60439, USA.
X-ray Science Division, Argonne National Laboratory, 9700 Cass Avenue, Lemont, IL, 60439, USA.
Sci Rep. 2022 Mar 29;12(1):5334. doi: 10.1038/s41598-022-09430-3.
While the advances in synchrotron light sources, together with the development of focusing optics and detectors, allow nanoscale ptychographic imaging of materials and biological specimens, the corresponding experiments can yield terabyte-scale volumes of data that can impose a heavy burden on the computing platform. Although graphics processing units (GPUs) provide high performance for such large-scale ptychography datasets, a single GPU is typically insufficient for analysis and reconstruction. Several works have considered leveraging multiple GPUs to accelerate the ptychographic reconstruction. However, most of these works utilize only the Message Passing Interface to handle the communications between GPUs. This approach poses inefficiency for a hardware configuration that has multiple GPUs in a single node, especially while reconstructing a single large projection, since it provides no optimizations to handle the heterogeneous GPU interconnections containing both low-speed (e.g., PCIe) and high-speed links (e.g., NVLink). In this paper, we provide an optimized intranode multi-GPU implementation that can efficiently solve large-scale ptychographic reconstruction problems. We focus on the maximum likelihood reconstruction problem using a conjugate gradient (CG) method for the solution and propose a novel hybrid parallelization model to address the performance bottlenecks in the CG solver. Accordingly, we have developed a tool, called PtyGer (Ptychographic GPU(multiple)-based reconstruction), implementing our hybrid parallelization model design. A comprehensive evaluation verifies that PtyGer can fully preserve the original algorithm's accuracy while achieving outstanding intranode GPU scalability.
虽然同步加速器光源的进步,加上聚焦光学和探测器的发展,使得对材料和生物样本进行纳米级叠层成像成为可能,但相应的实验会产生兆兆字节规模的数据,这可能会给计算平台带来沉重的负担。虽然图形处理单元(GPU)为如此大规模的叠层成像数据集提供了高性能,但单个 GPU 通常不足以进行分析和重建。有几项工作已经考虑利用多个 GPU 来加速叠层重建。然而,这些工作中的大多数仅使用消息传递接口(MPI)来处理 GPU 之间的通信。对于具有单个节点中多个 GPU 的硬件配置,这种方法效率低下,特别是在重建单个大投影时,因为它没有针对包含低速(例如 PCIe)和高速链路(例如 NVLink)的异构 GPU 互连进行优化。在本文中,我们提供了一种优化的节点内多 GPU 实现,能够有效地解决大规模叠层重建问题。我们专注于使用共轭梯度(CG)方法解决最大似然重建问题,并提出了一种新的混合并行化模型来解决 CG 求解器中的性能瓶颈。相应地,我们开发了一个工具,称为 PtyGer(基于叠层 GPU(多个)的重建),实现了我们的混合并行化模型设计。全面的评估验证了 PtyGer 可以在保持原始算法准确性的同时,实现出色的节点内 GPU 可扩展性。