Wang Jingjing, Liu Xiaoyu, Li Yuxin, Mao Ruina
School of Information Engineering, Shandong Youth University of Political Science, Jinan 250103, China.
School of Information Science and Engineering, Northeastern University, Shenyang 110819, China.
Materials (Basel). 2025 Apr 25;18(9):1955. doi: 10.3390/ma18091955.
Microstructure simulations of continuous casting billets are vital for understanding solidification mechanisms and optimizing process parameters. However, the commonly used CA (Cellular Automaton) model is limited by grid anisotropy, which affects the accuracy of dendrite morphology simulations. While the DCSA (Decentered Square Algorithm) reduces anisotropy, its high computational cost due to the use of fine grids and dynamic liquid/solid interface tracking hinders large-scale applications. To address this, we propose a high-performance CA-DCSA method on GPUs (Graphic Processing Units). The CA-DCSA algorithm is first refactored and implemented on a CPU-GPU heterogeneous architecture for efficient acceleration. Subsequently, key optimizations, including memory access management and warp divergence reduction, are proposed to enhance GPU utilization. Finally, simulated results are validated through industrial experiments, with relative errors of 2.5% (equiaxed crystal ratio) and 2.3% (average secondary dendrite arm spacing) in 65# steel, and 2.1% and 0.7% in 60# steel. The maximum temperature difference in 65# steel is 1.8 °C. Compared to the serial implementation, the GPU-accelerated method achieves a 1430× higher speed using two GPUs. This work has provided a powerful tool for detailed microstructure observation and process parameter optimization in continuous casting billets.
连铸坯的微观结构模拟对于理解凝固机制和优化工艺参数至关重要。然而,常用的元胞自动机(CA)模型受网格各向异性的限制,这会影响枝晶形态模拟的准确性。虽然偏心正方形算法(DCSA)降低了各向异性,但由于使用精细网格和动态液/固界面跟踪,其计算成本较高,阻碍了大规模应用。为了解决这个问题,我们提出了一种在图形处理单元(GPU)上的高性能CA-DCSA方法。首先在CPU-GPU异构架构上对CA-DCSA算法进行重构和实现,以实现高效加速。随后,提出了包括内存访问管理和减少 warp 发散在内的关键优化措施,以提高GPU利用率。最后,通过工业实验对模拟结果进行验证,65#钢的等轴晶率相对误差为2.5%,平均二次枝晶臂间距相对误差为2.3%,60#钢的相对误差分别为2.1%和0.7%。65#钢的最大温差为1.8℃。与串行实现相比,使用两个GPU的GPU加速方法实现了1430倍的更高速度。这项工作为连铸坯的详细微观结构观察和工艺参数优化提供了一个强大的工具。