Department of Applied Bioengineering and Research Institute for Convergence Science, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 08826, Republic of Korea.
Advanced Institute of Convergence Technology, Seoul National University, Suwon 16229, Republic of Korea.
Phys Med Biol. 2024 Aug 14;69(17). doi: 10.1088/1361-6560/ad694f.
This work aims to develop a graphics processing unit (GPU)-accelerated Monte Carlo code for the coupled transport of photon, electron/positron and neutron over a broad range of energies for medical applications.By separating the MC evolution of radiation into source, transport, and interaction kernels, the branch divergence was alleviated. The memory coalescence was achieved by vectorizing the access pattern in which the secondary particles were archived. To accelerate further particle tracking, ray-tracing hardware acceleration in the Nvidia OptiXframework was applied. For photon and electron/positron, the EGSnrc interaction modules were ported as a GPU-optimized configuration. For neutron, a group-wised transport based on NJOY21 preprocessed data was implemented. The developed code was validated against CPU-based FLUKA. Neutron, x-ray and electron beams incident on water and ICRP phantoms were simulated. The neutron energy group and the transport parameters of photon and electron were set to be the same in both codes. A single Nvidia RTX 4090 card was used in this code while all 20 threads of a single Intel Core i9-10900K node were used in FLUKA.The number of histories was set to ensure that statistical uncertainties lower than 2% for all voxels whose doses were larger than 20% of the maximum. In all cases, the dose differences in the voxels between the codes were within 2.5%. For photons and electrons, the developed code was 150300 times faster than FLUKA in both geometries. For neutrons, the code was respectively 80 and 135 times faster in the water and ICRP phantoms than FLUKA.This study offers an appropriate solution for uncoalesced memory access and branch divergence commonly encountered in coupled MC transport on the GPU architecture. The formidable acceleration in computing times and accuracy shown in this study can promise a routine clinical use of MC simulations.
这项工作旨在开发一种用于医学应用的宽能区光子、电子/正电子和中子耦合输运的图形处理单元 (GPU) 加速蒙特卡罗代码。通过将辐射的 MC 演化分为源、输运和相互作用核,分支分歧得到缓解。通过向存储二次粒子的访问模式进行矢量化实现了内存合并。为了进一步加速粒子追踪,在 Nvidia OptiX 框架中应用了光线追踪硬件加速。对于光子和电子/正电子,EGSnrc 相互作用模块被移植为 GPU 优化配置。对于中子,基于 NJOY21 预处理数据的分组输运得到实现。开发的代码与基于 CPU 的 FLUKA 进行了验证。模拟了水和 ICRP 体模上的中子、X 射线和电子束入射。在这两个代码中,设置了相同的中子能群和光子和电子的输运参数。在这个代码中使用了单个 Nvidia RTX 4090 卡,而在 FLUKA 中使用了单个 Intel Core i9-10900K 节点的所有 20 个线程。设置历史数,以确保对于剂量大于最大剂量的 20%的所有体素,统计不确定性低于 2%。在所有情况下,代码之间体素中的剂量差异都在 2.5%以内。对于光子和电子,在这两种几何形状下,开发的代码比 FLUKA 分别快 150300 倍和 10000 倍。对于中子,在水和 ICRP 体模中,该代码比 FLUKA 分别快 80 倍和 135 倍。本研究为 GPU 架构上常见的非合并内存访问和耦合 MC 传输中的分支分歧提供了一种合适的解决方案。本研究中显示的计算时间和精度的可观加速有望保证 MC 模拟的常规临床应用。