Department of Radiology, The Netherlands Cancer Institute, POB 90203, 1006 BE, Amsterdam, The Netherlands.
GROW School for Oncology & Developmental Biology, University of Maastricht, Maastricht, The Netherlands.
Eur Radiol. 2022 Mar;32(3):1506-1516. doi: 10.1007/s00330-021-08251-8. Epub 2021 Oct 16.
To investigate sources of variation in a multicenter rectal cancer MRI dataset focusing on hardware and image acquisition, segmentation methodology, and radiomics feature extraction software.
T2W and DWI/ADC MRIs from 649 rectal cancer patients were retrospectively acquired in 9 centers. Fifty-two imaging features (14 first-order/6 shape/32 higher-order) were extracted from each scan using whole-volume (expert/non-expert) and single-slice segmentations using two different software packages (PyRadiomics/CapTk). Influence of hardware, acquisition, and patient-intrinsic factors (age/gender/cTN-stage) on ADC was assessed using linear regression. Feature reproducibility was assessed between segmentation methods and software packages using the intraclass correlation coefficient.
Image features differed significantly (p < 0.001) between centers with more substantial variations in ADC compared to T2W-MRI. In total, 64.3% of the variation in mean ADC was explained by differences in hardware and acquisition, compared to 0.4% by patient-intrinsic factors. Feature reproducibility between expert and non-expert segmentations was good to excellent (median ICC 0.89-0.90). Reproducibility for single-slice versus whole-volume segmentations was substantially poorer (median ICC 0.40-0.58). Between software packages, reproducibility was good to excellent (median ICC 0.99) for most features (first-order/shape/GLCM/GLRLM) but poor for higher-order (GLSZM/NGTDM) features (median ICC 0.00-0.41).
Significant variations are present in multicenter MRI data, particularly related to differences in hardware and acquisition, which will likely negatively influence subsequent analysis if not corrected for. Segmentation variations had a minor impact when using whole volume segmentations. Between software packages, higher-order features were less reproducible and caution is warranted when implementing these in prediction models.
• Features derived from T2W-MRI and in particular ADC differ significantly between centers when performing multicenter data analysis. • Variations in ADC are mainly (> 60%) caused by hardware and image acquisition differences and less so (< 1%) by patient- or tumor-intrinsic variations. • Features derived using different image segmentations (expert/non-expert) were reproducible, provided that whole-volume segmentations were used. When using different feature extraction software packages with similar settings, higher-order features were less reproducible.
研究多中心直肠癌 MRI 数据集的变异性来源,重点关注硬件和图像采集、分割方法以及放射组学特征提取软件。
回顾性收集了 9 个中心的 649 例直肠癌患者的 T2W 和 DWI/ADC MRI。使用全容积(专家/非专家)和单切片分割,使用两种不同的软件包(PyRadiomics/CapTk)从每个扫描中提取了 52 个影像特征(14 个一阶/6 个形状/32 个高阶)。使用线性回归评估硬件、采集和患者内在因素(年龄/性别/cTN 分期)对 ADC 的影响。使用组内相关系数评估分割方法和软件包之间的特征可重复性。
图像特征在中心之间存在显著差异(p<0.001),ADC 差异大于 T2W-MRI。总体而言,平均 ADC 变异的 64.3%可归因于硬件和采集的差异,而只有 0.4%可归因于患者内在因素。专家和非专家分割之间的特征可重复性为良好至极好(中位数 ICC 0.89-0.90)。单切片与全容积分割之间的可重复性要差得多(中位数 ICC 0.40-0.58)。在软件包之间,大多数特征(一阶/形状/GLCM/GLRLM)的可重复性良好至极好(中位数 ICC 0.99),但高阶特征(GLSZM/NGTDM)的可重复性较差(中位数 ICC 0.00-0.41)。
多中心 MRI 数据存在显著差异,特别是与硬件和采集差异有关,这可能会对后续分析产生负面影响,除非进行校正。使用全容积分割时,分割差异的影响较小。在软件包之间,高阶特征的可重复性较差,在预测模型中使用时需要谨慎。
• 在进行多中心数据分析时,T2W-MRI 得出的特征,特别是 ADC,在中心之间存在显著差异。• ADC 的变化主要(>60%)由硬件和图像采集差异引起,而较少(<1%)由患者或肿瘤内在差异引起。• 使用不同的图像分割(专家/非专家)得出的特征具有可重复性,前提是使用全容积分割。当使用具有相似设置的不同特征提取软件包时,高阶特征的可重复性较差。