• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 GPU 的医学图像重建中多线程快速前向投影的多射线算法。

A fast forward projection using multithreads for multirays on GPUs in medical image reconstruction.

机构信息

Department of Bio-lndustrial Mechatronics Engineering, National Taiwan University, Taipei 106, Taiwan.

出版信息

Med Phys. 2011 Jul;38(7):4052-65. doi: 10.1118/1.3591994.

DOI:10.1118/1.3591994
PMID:21859004
Abstract

PURPOSE

Iterative reconstruction techniques hold great potential to mitigate the effects of data noise and/or incompleteness, and hence can facilitate the patient dose reduction. However, they are not suitable for routine clinical practice due to their long reconstruction times. In this work, the authors accelerated the computations by fully taking advantage of the highly parallel computational power on single and multiple graphics processing units (GPUs). In particular, the forward projection algorithm, which is not included in the close-form formulas, will be accelerated and optimized by using GPU here.

METHODS

The main contribution is a novel forward projection algorithm that uses multithreads to handle the computations associated with a bunch of adjacent rays simultaneously. The proposed algorithm is free of divergence and bank conflict on GPU, and benefits from data locality and data reuse. It achieves the efficiency particularly by (i) employing a tiled algorithm with three-level parallelization, (ii) optimizing thread block size, (iii) maximizing data reuse on constant memory and shared memory, and (iv) exploiting built-in texture memory interpolation capability to increase efficiency. In addition, to accelerate the iterative algorithms and the Feldkamp-Davis-Kress (FDK) algorithm on GPU, the authors apply batched fast Fourier transform (FFT) to expedite filtering process in FDK and utilize projection bundling parallelism during backprojection to shorten the execution times in FDK and the expectation-maximization (EM).

RESULTS

Numerical experiments conducted on an NVIDIA Tesla C1060 GPU demonstrated the superiority of the proposed algorithms in computational time saving. The forward projection, filtering, and backprojection times for generating a volume image of 512 x 512 x 512 with 360 projection data of 512 x 512 using one GPU are about 4.13, 0.65, and 2.47 s (including distance weighting), respectively. In particular, the proposed forward projection algorithm is ray-driven and its paralleli-zation strategy evolves from single-thread-for-single-ray (38.56 s), multithreads-for-single-ray (26.05 s), to multithreads-for-multirays (4.13 s). For the voxel-driven backprojection, the use of texture memory reduces the reconstruction time from 4.95 to 3.35 s. By applying the projection bundle technique, the computation time is further reduced to 2.47 s. When employing multiple GPUs, near-perfect speedups were observed as the number of GPUs increases. For example, by using four GPUs, the time for the forward projection, filtering, and backprojection are further reduced to 1.11, 0.18, and 0.66 s. The results obtained by GPU-based algorithms are virtually indistinguishable with those by CPU.

CONCLUSIONS

The authors have proposed a highly optimized GPU-based forward projection algorithm, as well as the GPU-based FDK and expectation-maximization reconstruction algorithms. Our compute unified device architecture (CUDA) codes provide the exceedingly fast forward projection and backprojection that outperform those using the shading languages, cell broadband engine architecture and previous CUDA implementations. The reconstruction times in the FDK and the EM algorithms were considerably shortened, and thus can facilitate their routine usage in a variety of applications such as image quality improvement and dose reduction.

摘要

目的

迭代重建技术具有减轻数据噪声和/或不完整性影响的巨大潜力,因此可以帮助降低患者的剂量。然而,由于其重建时间较长,它们不适合常规临床实践。在这项工作中,作者充分利用单 GPU 和多 GPU 的高度并行计算能力来加速计算。特别是,这里将使用 GPU 加速和优化不包括在闭式公式中的正向投影算法。

方法

主要贡献是一种新颖的正向投影算法,它使用多线程同时处理与一束相邻射线相关的计算。所提出的算法在 GPU 上没有散度和银行冲突的问题,并且受益于数据局部性和数据重用。它通过以下方式特别实现效率:(i)采用具有三级并行化的平铺算法,(ii)优化线程块大小,(iii)在常数内存和共享内存上最大化数据重用,以及(iv)利用内置纹理内存插值功能提高效率。此外,为了在 GPU 上加速迭代算法和 Feldkamp-Davis-Kress(FDK)算法,作者应用批处理快速傅里叶变换(FFT)来加速 FDK 中的滤波过程,并在反向投影中利用投影束并行性来缩短 FDK 和期望最大化(EM)中的执行时间。

结果

在 NVIDIA Tesla C1060 GPU 上进行的数值实验表明,所提出的算法在节省计算时间方面具有优越性。使用一个 GPU 生成 512 x 512 x 512 体积图像,使用 512 x 512 的 360 个投影数据,正向投影、滤波和反向投影的时间分别约为 4.13、0.65 和 2.47 秒(包括距离加权)。特别是,所提出的正向投影算法是射线驱动的,其并行化策略从单线程-单射线(38.56 秒)、多线程-单射线(26.05 秒)发展到多线程-多射线(4.13 秒)。对于体素驱动的反向投影,使用纹理内存将重建时间从 4.95 秒减少到 3.35 秒。通过应用投影束技术,计算时间进一步减少到 2.47 秒。当使用多个 GPU 时,随着 GPU 数量的增加,可以观察到近乎完美的加速效果。例如,使用四个 GPU 时,正向投影、滤波和反向投影的时间进一步减少到 1.11、0.18 和 0.66 秒。GPU 算法得到的结果与 CPU 算法几乎无法区分。

结论

作者提出了一种高度优化的基于 GPU 的正向投影算法,以及基于 GPU 的 FDK 和期望最大化重建算法。我们的计算统一设备架构(CUDA)代码提供了非常快速的正向投影和反向投影,优于使用着色语言、Cell Broadband Engine 架构和以前的 CUDA 实现的那些。FDK 和 EM 算法的重建时间大大缩短,从而可以促进它们在各种应用中的常规使用,例如图像质量改善和剂量降低。

相似文献

1
A fast forward projection using multithreads for multirays on GPUs in medical image reconstruction.基于 GPU 的医学图像重建中多线程快速前向投影的多射线算法。
Med Phys. 2011 Jul;38(7):4052-65. doi: 10.1118/1.3591994.
2
Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA.基于 CUDA 的 GPU 上完全 3D 列表模式飞行时间 PET 图像重建。
Med Phys. 2011 Dec;38(12):6775-86. doi: 10.1118/1.3661998.
3
A Fully GPU-Based Ray-Driven Backprojector via a Ray-Culling Scheme with Voxel-Level Parallelization for Cone-Beam CT Reconstruction.一种基于GPU的射线驱动反投影器,通过具有体素级并行化的射线剔除方案用于锥束CT重建。
Technol Cancer Res Treat. 2015 Dec;14(6):709-20. doi: 10.7785/tcrt.2012.500429. Epub 2014 Nov 26.
4
GPU-based fast cone beam CT reconstruction from undersampled and noisy projection data via total variation.基于 GPU 的快速锥形束 CT 重建:从欠采样和噪声投影数据中通过全变差方法。
Med Phys. 2010 Apr;37(4):1757-60. doi: 10.1118/1.3371691.
5
GPU-based streaming architectures for fast cone-beam CT image reconstruction and demons deformable registration.基于图形处理器(GPU)的流架构用于快速锥束计算机断层扫描(CT)图像重建和戴蒙斯可变形配准。
Phys Med Biol. 2007 Oct 7;52(19):5771-83. doi: 10.1088/0031-9155/52/19/003. Epub 2007 Sep 10.
6
Ultra-fast digital tomosynthesis reconstruction using general-purpose GPU programming for image-guided radiation therapy.基于通用 GPU 编程的用于图像引导放射治疗的超快速数字断层合成重建。
Technol Cancer Res Treat. 2011 Aug;10(4):295-306. doi: 10.7785/tcrt.2012.500206.
7
A filtered backprojection algorithm for cone beam reconstruction using rotational filtering under helical source trajectory.一种用于锥形束重建的滤波反投影算法,该算法在螺旋源轨迹下使用旋转滤波。
Med Phys. 2004 Nov;31(11):2949-60. doi: 10.1118/1.1803672.
8
Real-time volumetric image reconstruction and 3D tumor localization based on a single x-ray projection image for lung cancer radiotherapy.基于单次 X 射线投影图像的肺癌放疗实时容积图像重建和 3D 肿瘤定位。
Med Phys. 2010 Jun;37(6):2822-6. doi: 10.1118/1.3426002.
9
Fast compressed sensing-based CBCT reconstruction using Barzilai-Borwein formulation for application to on-line IGRT.基于快速压缩感知的 Barzilai-Borwein 公式的 CBCT 重建,用于在线 IGRT 应用。
Med Phys. 2012 Mar;39(3):1207-17. doi: 10.1118/1.3679865.
10
Four-dimensional cone beam CT reconstruction and enhancement using a temporal nonlocal means method.基于时域非局部均值方法的四维锥形束 CT 重建与增强。
Med Phys. 2012 Sep;39(9):5592-602. doi: 10.1118/1.4745559.

引用本文的文献

1
Source-detector trajectory optimization for CBCT metal artifact reduction based on PICCS reconstruction.基于 PICCS 重建的锥形束 CT 金属伪影降低源探测器轨迹优化。
Z Med Phys. 2024 Nov;34(4):565-579. doi: 10.1016/j.zemedi.2023.02.001. Epub 2023 Mar 25.
2
Multi GPU parallelization of maximum likelihood expectation maximization method for digital rock tomography data.用于数字岩心断层扫描数据的最大似然期望最大化方法的多GPU并行化
Sci Rep. 2021 Sep 17;11(1):18536. doi: 10.1038/s41598-021-97833-z.
3
Optimization for customized trajectories in cone beam computed tomography.
锥形束计算机断层扫描中定制轨迹的优化。
Med Phys. 2020 Oct;47(10):4786-4799. doi: 10.1002/mp.14403. Epub 2020 Aug 29.
4
GPU-based Branchless Distance-Driven Projection and Backprojection.基于图形处理器的无分支距离驱动投影与反投影
IEEE Trans Comput Imaging. 2017 Dec;3(4):617-632. doi: 10.1109/TCI.2017.2675705. Epub 2017 Feb 28.
5
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.Trace:一种用于大规模数据集的高通量断层扫描重建引擎。
Adv Struct Chem Imaging. 2017;3(1):6. doi: 10.1186/s40679-017-0040-7. Epub 2017 Jan 28.
6
An Effective CUDA Parallelization of Projection in Iterative Tomography Reconstruction.迭代断层扫描重建中投影的有效CUDA并行化
PLoS One. 2015 Nov 30;10(11):e0142184. doi: 10.1371/journal.pone.0142184. eCollection 2015.
7
GPU-based high-performance computing for radiation therapy.基于 GPU 的放射治疗高性能计算。
Phys Med Biol. 2014 Feb 21;59(4):R151-82. doi: 10.1088/0031-9155/59/4/R151. Epub 2014 Feb 3.
8
Accelerating image reconstruction in three-dimensional optoacoustic tomography on graphics processing units.在图形处理单元上加速三维光声断层成像的图像重建。
Med Phys. 2013 Feb;40(2):023301. doi: 10.1118/1.4774361.
9
Accelerating image reconstruction in dual-head PET system by GPU and symmetry properties.利用 GPU 和对称性质加速双头 PET 系统的图像重建。
PLoS One. 2012;7(12):e50540. doi: 10.1371/journal.pone.0050540. Epub 2012 Dec 26.
10
A GPU tool for efficient, accurate, and realistic simulation of cone beam CT projections.一种用于高效、准确、真实模拟锥形束 CT 投影的 GPU 工具。
Med Phys. 2012 Dec;39(12):7368-78. doi: 10.1118/1.4766436.