IBBT-Vision Lab, University of Antwerp Universiteitsplein 1, B-2610, Wilrijk, Belgium.
J Struct Biol. 2011 Nov;176(2):250-3. doi: 10.1016/j.jsb.2011.07.017. Epub 2011 Aug 5.
Iterative reconstruction algorithms are becoming increasingly important in electron tomography of biological samples. These algorithms, however, impose major computational demands. Parallelization must be employed to maintain acceptable running times. Graphics Processing Units (GPUs) have been demonstrated to be highly cost-effective for carrying out these computations with a high degree of parallelism. In a recent paper by Xu et al. (2010), a GPU implementation strategy was presented that obtains a speedup of an order of magnitude over a previously proposed GPU-based electron tomography implementation. In this technical note, we demonstrate that by making alternative design decisions in the GPU implementation, an additional speedup can be obtained, again of an order of magnitude. By carefully considering memory access locality when dividing the workload among blocks of threads, the GPU's cache is used more efficiently, making more effective use of the available memory bandwidth.
迭代重建算法在生物样本的电子层析成像中变得越来越重要。然而,这些算法对计算提出了重大要求。必须采用并行化来保持可接受的运行时间。已经证明,图形处理单元 (GPU) 非常适合以高度并行的方式进行这些计算,具有很高的成本效益。在 Xu 等人最近的一篇论文中(2010 年),提出了一种 GPU 实现策略,与之前提出的基于 GPU 的电子层析成像实现相比,该策略的速度提高了一个数量级。在本技术说明中,我们证明通过在 GPU 实现中做出替代设计决策,可以再次获得一个数量级的额外加速。通过在将工作负载分配给线程块时仔细考虑内存访问局部性,可以更有效地利用 GPU 的缓存,从而更有效地利用可用的内存带宽。