Medical Faculty of Mannheim, University of Heidelberg, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany.
Phys Med Biol. 2012 Mar 7;57(5):1217-29. doi: 10.1088/0031-9155/57/5/1217. Epub 2012 Feb 14.
We present a GPU implementation called GMC (GPU Monte Carlo) of the low energy (<100 GeV) electromagnetic part of the Geant4 Monte Carlo code using the NVIDIA® CUDA programming interface. The classes for electron and photon interactions as well as a new parallel particle transport engine were implemented. The way a particle is processed is not in a history by history manner but rather by an interaction by interaction method. Every history is divided into steps that are then calculated in parallel by different kernels. The geometry package is currently limited to voxelized geometries. A modified parallel Mersenne twister was used to generate random numbers and a random number repetition method on the GPU was introduced. All phantom results showed a very good agreement between GPU and CPU simulation with gamma indices of >97.5% for a 2%/2 mm gamma criteria. The mean acceleration on one GTX 580 for all cases compared to Geant4 on one CPU core was 4860. The mean number of histories per millisecond on the GPU for all cases was 658 leading to a total simulation time for one intensity-modulated radiation therapy dose distribution of 349 s. In conclusion, Geant4-based Monte Carlo dose calculations were significantly accelerated on the GPU.
我们提出了一种 GPU 实现方法,称为 GMC(GPU 蒙特卡罗),它使用 NVIDIA®CUDA 编程接口实现了 Geant4 蒙特卡罗代码的低能(<100GeV)电磁部分。实现了电子和光子相互作用的类以及新的并行粒子输运引擎。粒子的处理方式不是按历史记录进行,而是按相互作用进行。每个历史记录都分为步骤,然后由不同的内核并行计算。目前几何包仅限于体素化几何。修改后的并行梅森旋转器用于生成随机数,并在 GPU 上引入了随机数重复方法。所有的体模结果都显示出 GPU 和 CPU 模拟之间非常好的一致性,对于 2%/2mm 的伽马标准,伽马指数大于 97.5%。对于所有情况,与一个 CPU 核上的 Geant4 相比,一个 GTX 580 的平均加速为 4860。对于所有情况,GPU 上每毫秒的历史记录数平均为 658,导致一个强度调制放射治疗剂量分布的总模拟时间为 349 秒。总之,基于 Geant4 的蒙特卡罗剂量计算在 GPU 上得到了显著加速。