Samant Sanjiv S, Xia Junyi, Muyan-Ozcelik Pinar, Owens John D
Department of Nuclear and Radiological Engineering, University of Florida, Gainesville, Florida 32611-8300, USA.
Med Phys. 2008 Aug;35(8):3546-53. doi: 10.1118/1.2948318.
The advent of readily available temporal imaging or time series volumetric (4D) imaging has become an indispensable component of treatment planning and adaptive radiotherapy (ART) at many radiotherapy centers. Deformable image registration (DIR) is also used in other areas of medical imaging, including motion corrected image reconstruction. Due to long computation time, clinical applications of DIR in radiation therapy and elsewhere have been limited and consequently relegated to offline analysis. With the recent advances in hardware and software, graphics processing unit (GPU) based computing is an emerging technology for general purpose computation, including DIR, and is suitable for highly parallelized computing. However, traditional general purpose computation on the GPU is limited because the constraints of the available programming platforms. As well, compared to CPU programming, the GPU currently has reduced dedicated processor memory, which can limit the useful working data set for parallelized processing. We present an implementation of the demons algorithm using the NVIDIA 8800 GTX GPU and the new CUDA programming language. The GPU performance will be compared with single threading and multithreading CPU implementations on an Intel dual core 2.4 GHz CPU using the C programming language. CUDA provides a C-like language programming interface, and allows for direct access to the highly parallel compute units in the GPU. Comparisons for volumetric clinical lung images acquired using 4DCT were carried out. Computation time for 100 iterations in the range of 1.8-13.5 s was observed for the GPU with image size ranging from 2.0 x 10(6) to 14.2 x 10(6) pixels. The GPU registration was 55-61 times faster than the CPU for the single threading implementation, and 34-39 times faster for the multithreading implementation. For CPU based computing, the computational time generally has a linear dependence on image size for medical imaging data. Computational efficiency is characterized in terms of time per megapixels per iteration (TPMI) with units of seconds per megapixels per iteration (or spmi). For the demons algorithm, our CPU implementation yielded largely invariant values of TPMI. The mean TPMIs were 0.527 spmi and 0.335 spmi for the single threading and multithreading cases, respectively, with <2% variation over the considered image data range. For GPU computing, we achieved TPMI =0.00916 spmi with 3.7% variation, indicating optimized memory handling under CUDA. The paradigm of GPU based real-time DIR opens up a host of clinical applications for medical imaging.
随时可用的时间成像或时间序列容积(4D)成像的出现,已成为许多放疗中心治疗计划和自适应放疗(ART)中不可或缺的组成部分。可变形图像配准(DIR)也用于医学成像的其他领域,包括运动校正图像重建。由于计算时间长,DIR在放射治疗及其他领域的临床应用受到限制,因此只能用于离线分析。随着硬件和软件的最新进展,基于图形处理单元(GPU)的计算作为一种用于通用计算(包括DIR)的新兴技术,适用于高度并行化计算。然而,传统的GPU通用计算受到可用编程平台的限制。此外,与CPU编程相比,GPU目前的专用处理器内存减少,这可能会限制并行处理的有用工作数据集。我们展示了使用NVIDIA 8800 GTX GPU和新的CUDA编程语言实现的demons算法。将在英特尔双核2.4 GHz CPU上使用C编程语言的单线程和多线程CPU实现与GPU性能进行比较。CUDA提供了类似C语言的编程接口,并允许直接访问GPU中的高度并行计算单元。对使用4DCT获取的容积临床肺部图像进行了比较。对于图像大小范围为2.0×10⁶至14.2×10⁶像素的GPU,观察到100次迭代的计算时间在1.8 - 13.5秒范围内。对于单线程实现,GPU配准比CPU快55 - 61倍,对于多线程实现快34 - 39倍。对于基于CPU的计算,计算时间通常与医学成像数据的图像大小呈线性相关。计算效率以每次迭代每百万像素的时间(TPMI)来表征,单位为秒每百万像素每次迭代(或spmi)。对于demons算法,我们的CPU实现产生的TPMI值基本不变。单线程和多线程情况下的平均TPMI分别为0.527 spmi和0.335 spmi,在所考虑的图像数据范围内变化小于2%。对于GPU计算,我们实现了TPMI = 0.00916 spmi,变化为3.7%,表明在CUDA下内存处理得到了优化。基于GPU的实时DIR范式为医学成像开辟了许多临床应用。