Department of Electrical Engineering, Stanford University, Stanford, California 94305.
Med Phys. 2011 Dec;38(12):6775-86. doi: 10.1118/1.3661998.
List-mode processing is an efficient way of dealing with the sparse nature of positron emission tomography (PET) data sets and is the processing method of choice for time-of-flight (ToF) PET image reconstruction. However, the massive amount of computation involved in forward projection and backprojection limits the application of list-mode reconstruction in practice, and makes it challenging to incorporate accurate system modeling.
The authors present a novel formulation for computing line projection operations on graphics processing units (GPUs) using the compute unified device architecture (CUDA) framework, and apply the formulation to list-mode ordered-subsets expectation maximization (OSEM) image reconstruction. Our method overcomes well-known GPU challenges such as divergence of compute threads, limited bandwidth of global memory, and limited size of shared memory, while exploiting GPU capabilities such as fast access to shared memory and efficient linear interpolation of texture memory. Execution time comparison and image quality analysis of the GPU-CUDA method and the central processing unit (CPU) method are performed on several data sets acquired on a preclinical scanner and a clinical ToF scanner.
When applied to line projection operations for non-ToF list-mode PET, this new GPU-CUDA method is >200 times faster than a single-threaded reference CPU implementation. For ToF reconstruction, we exploit a ToF-specific optimization to improve the efficiency of our parallel processing method, resulting in GPU reconstruction >300 times faster than the CPU counterpart. For a typical whole-body scan with 75 × 75 × 26 image matrix, 40.7 million LORs, 33 subsets, and 3 iterations, the overall processing time is 7.7 s for GPU and 42 min for a single-threaded CPU. Image quality and accuracy are preserved for multiple imaging configurations and reconstruction parameters, with normalized root mean squared (RMS) deviation less than 1% between CPU and GPU-generated images for all cases.
A list-mode ToF OSEM library was developed on the GPU-CUDA platform. Our studies show that the GPU reformulation is considerably faster than a single-threaded reference CPU method especially for ToF processing, while producing virtually identical images. This new method can be easily adapted to enable more advanced algorithms for high resolution PET reconstruction based on additional information such as depth of interaction (DoI), photon energy, and point spread functions (PSFs).
列表模式处理是一种处理正电子发射断层扫描 (PET) 数据集稀疏性的有效方法,也是飞行时间 (ToF) PET 图像重建的首选处理方法。然而,正向投影和反向投影所涉及的大量计算限制了列表模式重建在实践中的应用,并使得精确的系统建模变得具有挑战性。
作者提出了一种在图形处理单元 (GPU) 上使用计算统一设备架构 (CUDA) 框架计算线投影操作的新公式,并将该公式应用于列表模式有序子集期望最大化 (OSEM) 图像重建。我们的方法克服了众所周知的 GPU 挑战,例如计算线程的发散、全局内存的有限带宽以及共享内存的有限大小,同时利用了 GPU 功能,例如快速访问共享内存和高效的纹理内存线性插值。在一台临床前 ToF 扫描仪和一台临床 ToF 扫描仪上采集的多个数据集上,对 GPU-CUDA 方法和中央处理单元 (CPU) 方法的执行时间比较和图像质量分析。
当应用于非 ToF 列表模式 PET 的线投影操作时,这种新的 GPU-CUDA 方法比单线程参考 CPU 实现快 200 多倍。对于 ToF 重建,我们利用 ToF 特定的优化来提高我们的并行处理方法的效率,从而使 GPU 重建比 CPU 快 300 多倍。对于典型的全身扫描,图像矩阵为 75×75×26,LOR 为 4070 万,子集为 33,迭代次数为 3,GPU 的整体处理时间为 7.7s,单线程 CPU 的处理时间为 42min。对于多种成像配置和重建参数,图像质量和准确性都得到了保留,所有情况下 CPU 和 GPU 生成的图像之间的归一化均方根 (RMS) 偏差小于 1%。
在 GPU-CUDA 平台上开发了一个列表模式 ToF OSEM 库。我们的研究表明,GPU 重新表述比单线程参考 CPU 方法快得多,尤其是对于 ToF 处理,同时产生几乎相同的图像。这种新方法可以轻松适应基于额外信息(如相互作用深度 (DoI)、光子能量和点扩散函数 (PSF))的高分辨率 PET 重建的更高级算法。