Department of Mechanical Engineering, The University of Hong Kong, Pok Fu Lam, Hong Kong.
Department of Computing, Imperial College London, London, SW7 2AZ, UK.
Int J Comput Assist Radiol Surg. 2021 Mar;16(3):375-386. doi: 10.1007/s11548-020-02303-y. Epub 2021 Jan 23.
Intensity-based image registration has been proven essential in many applications accredited to its unparalleled ability to resolve image misalignments. However, long registration time for image realignment prohibits its use in intra-operative navigation systems. There has been much work on accelerating the registration process by improving the algorithm's robustness, but the innate computation required by the registration algorithm has been unresolved.
Intensity-based registration methods involve operations with high arithmetic load and memory access demand, which supposes to be reduced by graphics processing units (GPUs). Although GPUs are widespread and affordable, there is a lack of open-source GPU implementations optimized for non-rigid image registration. This paper demonstrates performance-aware programming techniques, which involves systematic exploitation of GPU features, by implementing the diffeomorphic log-demons algorithm.
By resolving the pinpointed computation bottlenecks on GPU, our implementation of diffeomorphic log-demons on Nvidia GTX Titan X GPU has achieved ~ 95 times speed-up compared to the CPU and registered a 1.3-M voxel image in 286 ms. Even for large 37-M voxel images, our implementation is able to register in 8.56 s, which attained ~ 258 times speed-up. Our solution involves effective employment of GPU computation units, memory, and data bandwidth to resolve computation bottlenecks.
The computation bottlenecks in diffeomorphic log-demons are pinpointed, analyzed, and resolved using various GPU performance-aware programming techniques. The proposed fast computation on basic image operations not only enhances the computation of diffeomorphic log-demons, but is also potentially extended to speed up many other intensity-based approaches. Our implementation is open-source on GitHub at https://bit.ly/2PYZxQz .
基于强度的图像配准已被证明在许多应用中是必不可少的,因为它具有无与伦比的解决图像配准的能力。然而,图像重新配准的注册时间较长,限制了其在术中导航系统中的应用。已经有很多工作致力于通过提高算法的鲁棒性来加速注册过程,但是注册算法固有的计算需求尚未得到解决。
基于强度的配准方法涉及到具有高算术负载和内存访问需求的操作,这些操作可以通过图形处理单元(GPU)来减少。尽管 GPU 已经广泛应用且价格低廉,但缺乏针对非刚性图像配准的优化的开源 GPU 实现。本文通过实现变形对数恶魔算法,展示了性能感知编程技术,该技术涉及对 GPU 特性的系统利用。
通过在 GPU 上解决了确定的计算瓶颈,我们在 Nvidia GTX Titan X GPU 上实现的变形对数恶魔算法的速度比 CPU 快了约 95 倍,并在 286 毫秒内注册了 1300 万体素的图像。即使对于 3700 万体素的大型图像,我们的实现也能够在 8.56 秒内完成注册,速度提高了约 258 倍。我们的解决方案涉及到有效利用 GPU 的计算单元、内存和数据带宽来解决计算瓶颈。
通过各种 GPU 性能感知编程技术,确定了变形对数恶魔中的计算瓶颈,并对其进行了分析和解决。所提出的快速计算基本图像操作不仅增强了变形对数恶魔的计算能力,而且还可能扩展到加速许多其他基于强度的方法。我们的实现是开源的,可在 GitHub 上获得,网址为 https://bit.ly/2PYZxQz。