Zhang Bo, Yang Xiang, Yang Fei, Yang Xin, Qin Chenghu, Han Dong, Ma Xibo, Liu Kai, Tian Jie
Sino-Dutch Biomedical and Information Engineering School of Northeastern University, Shenyang, China.
Opt Express. 2010 Sep 13;18(19):20201-14. doi: 10.1364/OE.18.020201.
In molecular imaging (MI), especially the optical molecular imaging, bioluminescence tomography (BLT) emerges as an effective imaging modality for small animal imaging. The finite element methods (FEMs), especially the adaptive finite element (AFE) framework, play an important role in BLT. The processing speed of the FEMs and the AFE framework still needs to be improved, although the multi-thread CPU technology and the multi CPU technology have already been applied. In this paper, we for the first time introduce a new kind of acceleration technology to accelerate the AFE framework for BLT, using the graphics processing unit (GPU). Besides the processing speed, the GPU technology can get a balance between the cost and performance. The CUBLAS and CULA are two main important and powerful libraries for programming on NVIDIA GPUs. With the help of CUBLAS and CULA, it is easy to code on NVIDIA GPU and there is no need to worry about the details about the hardware environment of a specific GPU. The numerical experiments are designed to show the necessity, effect and application of the proposed CUBLAS and CULA based GPU acceleration. From the results of the experiments, we can reach the conclusion that the proposed CUBLAS and CULA based GPU acceleration method can improve the processing speed of the AFE framework very much while getting a balance between cost and performance.
在分子成像(MI)中,尤其是光学分子成像领域,生物发光断层扫描(BLT)成为一种用于小动物成像的有效成像方式。有限元方法(FEM),特别是自适应有限元(AFE)框架,在BLT中发挥着重要作用。尽管多线程CPU技术和多CPU技术已经得到应用,但FEM和AFE框架的处理速度仍有待提高。在本文中,我们首次引入一种新型加速技术,即使用图形处理单元(GPU)来加速用于BLT的AFE框架。除了处理速度外,GPU技术还能在成本和性能之间取得平衡。CUBLAS和CULA是用于在NVIDIA GPU上进行编程的两个主要且强大的库。借助CUBLAS和CULA,在NVIDIA GPU上进行编码很容易,而且无需担心特定GPU硬件环境的细节。设计数值实验以展示所提出的基于CUBLAS和CULA的GPU加速的必要性、效果及应用。从实验结果可以得出结论,所提出的基于CUBLAS和CULA的GPU加速方法能够在很大程度上提高AFE框架的处理速度,同时在成本和性能之间取得平衡。