Fatyga Mirek, Dogan Nesrin, Weiss Elizabeth, Sleeman William C, Zhang Baoshe, Lehman William J, Williamson Jeffrey F, Wijesooriya Krishni, Christensen Gary E
Department of Radiation Oncology, Virginia Commonwealth University Medical Center , Richmond, VA , USA.
Department of Radiation Oncology, University of Virginia Health Systems , Charlottesville, VA , USA.
Front Oncol. 2015 Feb 4;5:17. doi: 10.3389/fonc.2015.00017. eCollection 2015.
Commonly used methods of assessing the accuracy of deformable image registration (DIR) rely on image segmentation or landmark selection. These methods are very labor intensive and thus limited to relatively small number of image pairs. The direct voxel-by-voxel comparison can be automated to examine fluctuations in DIR quality on a long series of image pairs.
A voxel-by-voxel comparison of three DIR algorithms applied to lung patients is presented. Registrations are compared by comparing volume histograms formed both with individual DIR maps and with a voxel-by-voxel subtraction of the two maps. When two DIR maps agree one concludes that both maps are interchangeable in treatment planning applications, though one cannot conclude that either one agrees with the ground truth. If two DIR maps significantly disagree one concludes that at least one of the maps deviates from the ground truth. We use the method to compare 3 DIR algorithms applied to peak inhale-peak exhale registrations of 4DFBCT data obtained from 13 patients.
All three algorithms appear to be nearly equivalent when compared using DICE similarity coefficients. A comparison based on Jacobian volume histograms shows that all three algorithms measure changes in total volume of the lungs with reasonable accuracy, but show large differences in the variance of Jacobian distribution on contoured structures. Analysis of voxel-by-voxel subtraction of DIR maps shows differences between algorithms that exceed a centimeter for some registrations.
Deformation maps produced by DIR algorithms must be treated as mathematical approximations of physical tissue deformation that are not self-consistent and may thus be useful only in applications for which they have been specifically validated. The three algorithms tested in this work perform fairly robustly for the task of contour propagation, but produce potentially unreliable results for the task of DVH accumulation or measurement of local volume change. Performance of DIR algorithms varies significantly from one image pair to the next hence validation efforts, which are exhaustive but performed on a small number of image pairs may not reflect the performance of the same algorithm in practical clinical situations. Such efforts should be supplemented by validation based on a longer series of images of clinical quality.
评估可变形图像配准(DIR)准确性的常用方法依赖于图像分割或地标选择。这些方法非常耗费人力,因此仅限于相对少量的图像对。直接逐体素比较可以自动化,以检查一长串图像对中DIR质量的波动情况。
本文展示了对应用于肺部患者的三种DIR算法进行逐体素比较的情况。通过比较由各个DIR图形成的体积直方图以及两个图的逐体素相减结果来比较配准情况。当两个DIR图一致时,可以得出结论:在治疗计划应用中,这两个图是可互换的,尽管不能得出其中任何一个图与真实情况相符的结论。如果两个DIR图明显不一致,则可以得出结论:至少其中一个图偏离了真实情况。我们使用该方法比较了应用于13名患者的4DFBCT数据的吸气峰值-呼气峰值配准的三种DIR算法。
使用DICE相似系数进行比较时,所有三种算法似乎几乎等效。基于雅可比行列式体积直方图的比较表明,所有三种算法在测量肺部总体积变化方面具有合理的准确性,但在轮廓结构上的雅可比行列式分布方差方面存在很大差异。对DIR图的逐体素相减分析表明,对于某些配准,算法之间的差异超过一厘米。
DIR算法生成的变形图必须被视为物理组织变形的数学近似,这些近似并不自洽,因此可能仅在经过专门验证的应用中有用。在这项工作中测试的三种算法在轮廓传播任务中表现相当稳健,但在剂量体积直方图(DVH)累积或局部体积变化测量任务中可能产生潜在不可靠的结果。DIR算法的性能在不同的图像对之间差异很大,因此尽管验证工作详尽,但仅在少量图像对上进行,可能无法反映同一算法在实际临床情况下的性能。此类工作应以基于更长系列临床质量图像的验证作为补充。