IEEE Trans Ultrason Ferroelectr Freq Control. 2014 Jan;61(1):207-13. doi: 10.1109/TUFFC.2014.6689790.
Deformation of tissue can be accurately estimated from radio-frequency ultrasound data using a 2-dimensional normalized cross correlation (NCC)-based algorithm. This procedure, however, is very computationally time-consuming. A major time reduction can be achieved by parallelizing the numerous computations of NCC. In this paper, two approaches for parallelization have been investigated: the OpenMP interface on a multi-CPU system and Compute Unified Device Architecture (CUDA) on a graphics processing unit (GPU). The performance of the OpenMP and GPU approaches were compared with a conventional Matlab implementation of NCC. The OpenMP approach with 8 threads achieved a maximum speed-up factor of 132 on the computing of NCC, whereas the GPU approach on an Nvidia Tesla K20 achieved a maximum speed-up factor of 376. Neither parallelization approach resulted in a significant loss in image quality of the elastograms. Parallelization of the NCC computations using the GPU, therefore, significantly reduces the computation time and increases the frame rate for motion estimation.
使用基于二维归一化互相关(NCC)的算法,可以从射频超声数据中准确估计组织变形。然而,该过程在计算上非常耗时。通过并行化 NCC 的大量计算,可以大大减少时间。在本文中,研究了两种并行化方法:多 CPU 系统上的 OpenMP 接口和图形处理单元(GPU)上的 Compute Unified Device Architecture(CUDA)。比较了 OpenMP 和 GPU 方法与传统的 NCC Matlab 实现的性能。具有 8 个线程的 OpenMP 方法在 NCC 的计算中实现了最大 132 的加速因子,而 Nvidia Tesla K20 上的 GPU 方法则实现了最大 376 的加速因子。这两种并行化方法都没有导致弹性图的图像质量显著下降。因此,使用 GPU 对 NCC 计算进行并行化可以大大减少计算时间并提高运动估计的帧率。