Mang Andreas, Gholami Amir, Davatzikos Christos, Biros George
Department of Mathematics, University of Houston, Houston, TX 77204-5008.
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720-1770.
SIAM J Sci Comput. 2019;41(5):C548-C584. doi: 10.1137/18m1207818. Epub 2019 Oct 24.
With this work we release CLAIRE, a distributed-memory implementation of an effective solver for constrained large deformation diifeomorphic image registration problems in three dimensions. We consider an optimal control formulation. We invert for a stationary velocity field that parameterizes the deformation map. Our solver is based on a globalized, preconditioned, inexact reduced space Gauss‒Newton‒Krylov scheme. We exploit state-of-the-art techniques in scientific computing to develop an eifective solver that scales to thousands of distributed memory nodes on high-end clusters. We present the formulation, discuss algorithmic features, describe the software package, and introduce an improved preconditioner for the reduced space Hessian to speed up the convergence of our solver. We test registration performance on synthetic and real data. We Demonstrate registration accuracy on several neuroimaging datasets. We compare the performance of our scheme against diiferent flavors of the Demons algorithm for diifeomorphic image registration. We study convergence of our preconditioner and our overall algorithm. We report scalability results on state-of-the-art supercomputing platforms. We Demonstrate that we can solve registration problems for clinically relevant data sizes in two to four minutes on a standard compute node with 20 cores, attaining excellent data fidelity. With the present work we achieve a speedup of (on average) 5× with a peak performance of up to 17× compared to our former work.
通过这项工作,我们发布了CLAIRE,这是一种用于求解三维约束大变形微分同胚图像配准问题的有效求解器的分布式内存实现。我们考虑一种最优控制公式。我们求一个参数化变形映射的平稳速度场的逆。我们的求解器基于一种全局化、预处理、不精确的约简空间高斯-牛顿-克里洛夫格式。我们利用科学计算中的最新技术开发了一种有效的求解器,该求解器可扩展到高端集群上的数千个分布式内存节点。我们给出了公式,讨论了算法特性,描述了软件包,并为约简空间海森矩阵引入了一种改进的预处理器,以加速求解器的收敛。我们在合成数据和真实数据上测试配准性能。我们在几个神经成像数据集上展示了配准精度。我们将我们的方案与用于微分同胚图像配准的不同版本的 demons 算法的性能进行了比较。我们研究了预处理器和整个算法的收敛性。我们报告了在最先进的超级计算平台上的可扩展性结果。我们证明,在具有20个核心的标准计算节点上,我们可以在两到四分钟内解决临床相关数据大小的配准问题,获得出色的数据保真度。通过目前的工作,与我们以前的工作相比,我们实现了(平均)5倍的加速,峰值性能高达17倍。