Suppr超能文献

利用现代计算加速器快速加速二维波传播模拟。

Fast acceleration of 2D wave propagation simulations using modern computational accelerators.

作者信息

Wang Wei, Xu Lifan, Cavazos John, Huang Howie H, Kay Matthew

机构信息

Computer and Information Sciences Department, University of Delaware, Newark, Delaware, United States of America.

Electrical and Computer Engineering Department, George Washington University, Washington, DC, United States of America.

出版信息

PLoS One. 2014 Jan 30;9(1):e86484. doi: 10.1371/journal.pone.0086484. eCollection 2014.

Abstract

Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than 150x speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least 200x faster than the sequential implementation and 30x faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of 120x with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media.

摘要

诸如图形处理单元(GPU)和协处理器等现代计算加速器的最新发展,为使科学应用程序运行得比以往任何时候都更快提供了巨大机遇。然而,使用CUDA等新编程工具对科学代码进行高效并行化需要高水平的专业知识,而许多科学家并不具备这种知识。再加上并行化代码通常无法移植到不同架构,这给充分利用现代计算加速器的全部功能带来了重大挑战。在这项工作中,我们试图通过研究如何使用OpenACC实现自动并行化以及使用OpenCL提高可移植性来克服这些挑战。我们将并行化方案应用于GPU以及英特尔众核(MIC)协处理器,以减少波传播模拟的运行时间。我们使用一个成熟的二维心脏动作电位模型作为具体案例研究。据我们所知,我们是首个研究使用OpenACC对二维心脏波传播模拟进行自动并行化的。我们的结果确定了几种能带来显著加速的方法。OpenACC生成的GPU代码比顺序实现快了150倍以上,并且只需在代码中添加少量OpenACC编译指示。OpenCL实现使GPU上的加速比顺序实现至少快200倍,比并行化的OpenMP实现快30倍。在英特尔MIC协处理器上的OpenMP实现只需对顺序实现进行少量代码更改,就能实现120倍的加速。我们强调,OpenACC为在GPU上实现二维心脏波模拟的并行化提供了一种自动、高效且可移植的方法。我们在现代计算加速器上使用OpenACC、OpenCL和OpenMP对这个特定模型进行并行化的方法,应该适用于多维介质中波传播的其他计算模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0602/3907428/ec1d45744b07/pone.0086484.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验