Fatania Kavi, Frood Russell, Mistry Hitesh, Short Susan C, O'Connor James, Scarsbrook Andrew F, Currie Stuart
Department of Radiology, Leeds Teaching Hospitals NHS Trust, England, UK.
Leeds Institute of Medical Research, University of Leeds, Leeds, UK.
Eur Radiol. 2025 Jun;35(6):3354-3366. doi: 10.1007/s00330-024-11168-7. Epub 2024 Nov 28.
To assess the effect of different intensity standardisation techniques (ISTs) and ComBat batch sizes on radiomics survival model performance and stability in a heterogenous, multi-centre cohort of patients with glioblastoma (GBM).
Multi-centre pre-operative MRI acquired between 2014 and 2020 in patients with IDH-wildtype unifocal WHO grade 4 GBM were retrospectively evaluated. WhiteStripe (WS), Nyul histogram matching (HM), and Z-score (ZS) ISTs were applied before radiomic feature (RF) extraction. RFs were realigned using ComBat and minimum batch size (MBS) of 5, 10, or 15 patients. Cox proportional hazards models for overall survival (OS) prediction were produced using five different selection strategies and the impact of IST and MBS was evaluated using bootstrapping. Calibration, discrimination, relative explained variation, and model fit were assessed. Instability was evaluated using 95% confidence intervals (95% CIs), feature selection frequency and calibration curves across the bootstrap resamples.
One hundred ninety-five patients were included. Median OS = 13 (95% CI: 12-14) months. Twelve to fourteen unique MRI protocols were used per MRI sequence. HM and WS produced the highest relative increase in model discrimination, explained variation and model fit but IST choice did not greatly impact on stability, nor calibration. Larger ComBat batches improved discrimination, model fit, and explained variation but higher MBS (reduced sample size) reduced stability (across all performance metrics) and reduced calibration accuracy.
Heterogenous, real-world GBM data poses a challenge to the reproducibility of radiomics. ComBat generally improved model performance as MBS increased but reduced stability and calibration. HM and WS tended to improve model performance.
Question ComBat harmonisation of RFs and intensity standardisation of MRI have not been thoroughly evaluated in multicentre, heterogeneous GBM data. Findings The addition of ComBat and ISTs can improve discrimination, relative model fit, and explained variance but degrades the calibration and stability of survival models. Clinical relevance Radiomics risk prediction models in real-world, multicentre contexts could be improved by ComBat and ISTs, however, this degrades calibration and prediction stability and this must be thoroughly investigated before patients can be accurately separated into different risk groups.
评估不同强度标准化技术(IST)和ComBat批次大小对胶质母细胞瘤(GBM)异质性多中心队列中放射组学生存模型性能和稳定性的影响。
回顾性评估2014年至2020年间为异柠檬酸脱氢酶(IDH)野生型单灶性世界卫生组织4级GBM患者采集的多中心术前磁共振成像(MRI)。在提取放射组学特征(RF)之前应用WhiteStripe(WS)、纽尔直方图匹配(HM)和Z分数(ZS)IST。使用ComBat和5、10或15名患者的最小批次大小(MBS)对RF进行重新调整。使用五种不同的选择策略生成用于总体生存(OS)预测的Cox比例风险模型,并使用自抽样评估IST和MBS的影响。评估校准、区分度、相对解释变异和模型拟合。使用95%置信区间(95%CI)、特征选择频率和自抽样重采样的校准曲线评估不稳定性。
纳入195例患者。中位OS = 13(95%CI:12 - 14)个月。每个MRI序列使用12至14种独特的MRI方案。HM和WS使模型区分度、解释变异和模型拟合的相对增加最高,但IST的选择对稳定性和校准影响不大。更大的ComBat批次改善了区分度、模型拟合和解释变异,但更高的MBS(样本量减少)降低了稳定性(在所有性能指标方面)并降低了校准准确性。
异质性的真实世界GBM数据对放射组学的可重复性构成挑战。随着MBS增加,ComBat通常会改善模型性能,但会降低稳定性和校准。HM和WS倾向于改善模型性能。
问题在多中心、异质性GBM数据中,尚未对RF的ComBat归一化和MRI的强度标准化进行全面评估。发现添加ComBat和IST可以改善区分度、相对模型拟合和解释方差,但会降低生存模型的校准和稳定性。临床意义在真实世界的多中心背景下,ComBat和IST可以改善放射组学风险预测模型,然而,这会降低校准和预测稳定性,在能够准确地将患者分为不同风险组之前,必须对此进行全面研究。