Bicer Tekin, Gürsoy Dogˇa, Kettimuthu Rajkumar, De Carlo Francesco, Foster Ian T
Mathematics and Computer Science Division, Argonne National Laboratory, USA.
Advanced Photon Source, X-ray Science Division, Argonne National Laboratory, USA.
J Synchrotron Radiat. 2016 Jul;23(Pt 4):997-1005. doi: 10.1107/S1600577516007980. Epub 2016 Jun 15.
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.
同步加速器光源的新技术进步使得数据采集达到了前所未有的水平。这一新兴趋势不仅影响了所生成数据的规模,还增加了对更大计算资源的需求。尽管束线科学家和用户可以使用本地计算资源,但这些资源通常有限,可能导致执行时间延长。像断层扫描重建方法中基于迭代处理的应用需要高性能计算集群来及时分析数据。在此,重点关注对高级光子源数据在地理分布资源上进行时间敏感型分析和处理。考虑了两个主要挑战:(i)断层扫描重建工作流程性能的建模,以及(ii)这些工作流程在分布式资源上的透明执行。对于前者,考虑了三个主要阶段:(i)存储与计算资源之间的数据传输,(i)计算资源上重建作业的等待/排队时间,以及(iii)重建任务的计算。这些性能模型允许评估和估计在地理分布资源上运行的任何给定迭代断层扫描重建工作流程的执行时间。对于后一个挑战,构建了一个工作流程管理系统,该系统可以自动执行工作流程,并最大限度地减少用户与底层基础设施的交互。该系统利用Globus执行安全高效的数据传输操作。使用三个高性能计算资源和两个存储资源(所有这些资源均地理分布)对所提出的模型和工作流程管理系统进行了评估。使用两种计算密集型断层扫描重建算法创建了具有不同计算要求的工作流程。实验评估表明,所提出的模型和系统可用于选择最佳资源,这反过来可以提供高达3.13倍的加速比(在所测试的资源上)。此外,模型的错误率在2.1%至23.3%之间(考虑工作流程执行时间),其中模型估计的准确性随着重建任务中更高的计算需求而提高。