Ou Yangming, Akbari Hamed, Bilello Michel, Da Xiao, Davatzikos Christos
IEEE Trans Med Imaging. 2014 Oct;33(10):2039-65. doi: 10.1109/TMI.2014.2330355. Epub 2014 Jun 13.
Evaluating various algorithms for the inter-subject registration of brain magnetic resonance images (MRI) is a necessary topic receiving growing attention. Existing studies evaluated image registration algorithms in specific tasks or using specific databases (e.g., only for skull-stripped images, only for single-site images, etc.). Consequently, the choice of registration algorithms seems task- and usage/parameter-dependent. Nevertheless, recent large-scale, often multi-institutional imaging-related studies create the need and raise the question whether some registration algorithms can 1) generally apply to various tasks/databases posing various challenges; 2) perform consistently well, and while doing so, 3) require minimal or ideally no parameter tuning. In seeking answers to this question, we evaluated 12 general-purpose registration algorithms, for their generality, accuracy and robustness. We fixed their parameters at values suggested by algorithm developers as reported in the literature. We tested them in 7 databases/tasks, which present one or more of 4 commonly-encountered challenges: 1) inter-subject anatomical variability in skull-stripped images; 2) intensity homogeneity, noise and large structural differences in raw images; 3) imaging protocol and field-of-view (FOV) differences in multi-site data; and 4) missing correspondences in pathology-bearing images. Totally 7,562 registrations were performed. Registration accuracies were measured by (multi-)expert-annotated landmarks or regions of interest (ROIs). To ensure reproducibility, we used public software tools, public databases (whenever possible), and we fully disclose the parameter settings. We show evaluation results, and discuss the performances in light of algorithms' similarity metrics, transformation models and optimization strategies. We also discuss future directions for the algorithm development and evaluations.
评估用于脑磁共振成像(MRI)受试者间配准的各种算法是一个日益受到关注的必要课题。现有研究在特定任务中或使用特定数据库(例如,仅针对去颅骨图像、仅针对单站点图像等)评估图像配准算法。因此,配准算法的选择似乎取决于任务以及使用情况/参数。然而,最近的大规模、通常是多机构的成像相关研究产生了需求,并提出了这样一个问题:某些配准算法是否能够1)普遍适用于提出各种挑战的各种任务/数据库;2)始终表现良好,并且在此过程中,3)需要最少的参数调整,理想情况下无需参数调整。为了寻求这个问题的答案,我们评估了12种通用配准算法的通用性、准确性和鲁棒性。我们将它们的参数固定为文献中算法开发者建议的值。我们在7个数据库/任务中对它们进行了测试,这些数据库/任务呈现出4种常见挑战中的一种或多种:1)去颅骨图像中的受试者间解剖变异;2)原始图像中的强度均匀性、噪声和大的结构差异;3)多站点数据中的成像协议和视野(FOV)差异;4)病变图像中对应关系的缺失。总共进行了7562次配准。配准精度通过(多位)专家标注的地标或感兴趣区域(ROI)来衡量。为确保可重复性,我们使用了公共软件工具、公共数据库(只要可能),并完全公开了参数设置。我们展示了评估结果,并根据算法的相似性度量、变换模型和优化策略讨论了性能。我们还讨论了算法开发和评估的未来方向。