IEEE Trans Ultrason Ferroelectr Freq Control. 2018 Aug;65(8):1370-1379. doi: 10.1109/TUFFC.2018.2841346. Epub 2018 May 28.
A multilevel Lagrangian carotid strain imaging algorithm is analyzed to identify computational bottlenecks for implementation on a graphics processing unit (GPU). Displacement tracking including regularization was found to be the most computationally expensive aspect of this strain imaging algorithm taking about 2.2 h for an entire cardiac cycle. This intensive displacement tracking was essential to obtain Lagrangian strain tensors. However, most of the computational techniques used for displacement tracking are parallelizable, and hence GPU implementation is expected to be beneficial. A new scheme for subsample displacement estimation referred to as a multilevel global peak finder was also developed since the Nelder-Mead simplex optimization technique used in the CPU implementation was not suitable for GPU implementation. GPU optimizations to minimize thread divergence and utilization of shared and texture memories were also implemented. This enables efficient use of the GPU computational hardware and memory bandwidth. Overall, an application speedup of was obtained enabling the algorithm to finish in about 50 s for a cardiac cycle. Last, comparison of GPU and CPU implementations demonstrated no significant difference in the quality of displacement vector and strain tensor estimation with the two implementations up to a 5% interframe deformation. Hence, a GPU implementation is feasible for clinical adoption and opens opportunity for other computationally intensive techniques.
分析了一种多层拉格朗日颈动脉应变成像算法,以确定在图形处理单元 (GPU) 上实现的计算瓶颈。发现位移跟踪(包括正则化)是该应变成像算法中最耗费计算资源的方面,整个心脏周期大约需要 2.2 小时。这种密集的位移跟踪对于获得拉格朗日应变张量至关重要。然而,用于位移跟踪的大多数计算技术都是可并行化的,因此预计 GPU 实现将是有益的。由于在 CPU 实现中使用的 Nelder-Mead 单形优化技术不适合 GPU 实现,因此还开发了一种称为多级全局峰值查找器的新的子样本位移估计方案。还实施了最小化线程发散以及利用共享和纹理内存的 GPU 优化。这可以有效地利用 GPU 计算硬件和内存带宽。总体而言,应用程序的加速比为 ,使得算法能够在大约 50 秒内完成一个心脏周期的计算。最后,GPU 和 CPU 实现的比较表明,在 5%的帧间变形范围内,两种实现的位移矢量和应变张量估计的质量没有显著差异。因此,GPU 实现对于临床应用是可行的,并为其他计算密集型技术开辟了机会。