INFN Sezione di Genova, Via Dodecaneso 33, Genova 16146, Italy. Department of Nuclear Engineering, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul 04763, Republic of Korea.
Phys Med Biol. 2018 May 4;63(9):09NT02. doi: 10.1088/1361-6560/aabd20.
In this study, the multi-threading performance of the Geant4, MCNP6, and PHITS codes was evaluated as a function of the number of threads (N) and the complexity of the tetrahedral-mesh phantom. For this, three tetrahedral-mesh phantoms of varying complexity (simple, moderately complex, and highly complex) were prepared and implemented in the three different Monte Carlo codes, in photon and neutron transport simulations. Subsequently, for each case, the initialization time, calculation time, and memory usage were measured as a function of the number of threads used in the simulation. It was found that for all codes, the initialization time significantly increased with the complexity of the phantom, but not with the number of threads. Geant4 exhibited much longer initialization time than the other codes, especially for the complex phantom (MRCP). The improvement of computation speed due to the use of a multi-threaded code was calculated as the speed-up factor, the ratio of the computation speed on a multi-threaded code to the computation speed on a single-threaded code. Geant4 showed the best multi-threading performance among the codes considered in this study, with the speed-up factor almost linearly increasing with the number of threads, reaching ~30 when N = 40. PHITS and MCNP6 showed a much smaller increase of the speed-up factor with the number of threads. For PHITS, the speed-up factors were low when N = 40. For MCNP6, the increase of the speed-up factors was better, but they were still less than ~10 when N = 40. As for memory usage, Geant4 was found to use more memory than the other codes. In addition, compared to that of the other codes, the memory usage of Geant4 more rapidly increased with the number of threads, reaching as high as ~74 GB when N = 40 for the complex phantom (MRCP). It is notable that compared to that of the other codes, the memory usage of PHITS was much lower, regardless of both the complexity of the phantom and the number of threads, hardly increasing with the number of threads for the MRCP.
在这项研究中,评估了 Geant4、MCNP6 和 PHITS 代码的多线程性能,作为线程数量(N)和四面体网格体素复杂性的函数。为此,准备了三个具有不同复杂性(简单、中等复杂和高度复杂)的四面体网格体素,并将它们在三种不同的蒙特卡罗代码中实现,用于光子和中子传输模拟。随后,针对每种情况,根据模拟中使用的线程数量,测量了初始化时间、计算时间和内存使用情况。结果发现,对于所有代码,初始化时间随着体素的复杂性显著增加,但不随线程数量增加。与其他代码相比,Geant4 的初始化时间要长得多,尤其是对于复杂的体素(MRCP)。由于使用多线程代码而提高的计算速度计算为加速因子,即多线程代码上的计算速度与单线程代码上的计算速度之比。在本研究中考虑的代码中,Geant4 表现出最好的多线程性能,加速因子几乎随线程数量线性增加,当 N=40 时达到30。PHITS 和 MCNP6 随着线程数量的增加,加速因子的增加要小得多。对于 PHITS,当 N=40 时,加速因子较低。对于 MCNP6,加速因子的增加要好一些,但当 N=40 时,它们仍然小于10。至于内存使用情况,发现 Geant4 比其他代码使用更多的内存。此外,与其他代码相比,Geant4 的内存使用量随线程数量的增加更快增加,当 N=40 时,对于复杂的体素(MRCP),达到高达~74GB。值得注意的是,与其他代码相比,PHITS 的内存使用量要低得多,无论体素的复杂性和线程数量如何,对于 MRCP,几乎不随线程数量增加。