Strotzer Quirin D, Wagner Thomas, Angstwurm Pia, Hense Katharina, Scheuermeyer Lucca, Noeva Ekaterina, Dinkel Johannes, Stroszczynski Christian, Fellner Claudia, Riemenschneider Markus J, Rosengarth Katharina, Pukrop Tobias, Wiesinger Isabel, Wendl Christina, Schicho Andreas
Department of Radiology, University Medical Center Regensburg, Regensburg, Germany.
Division of Neuroradiology, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.
Neurooncol Adv. 2024 Apr 20;6(1):vdae060. doi: 10.1093/noajnl/vdae060. eCollection 2024 Jan-Dec.
Growing research demonstrates the ability to predict histology or genetic information of various malignancies using radiomic features extracted from imaging data. This study aimed to investigate MRI-based radiomics in predicting the primary tumor of brain metastases through internal and external validation, using oversampling techniques to address the class imbalance.
This IRB-approved retrospective multicenter study included brain metastases from lung cancer, melanoma, breast cancer, colorectal cancer, and a combined heterogenous group of other primary entities (5-class classification). Local data were acquired between 2003 and 2021 from 231 patients (545 metastases). External validation was performed with 82 patients (280 metastases) and 258 patients (809 metastases) from the publicly available Stanford BrainMetShare and the University of California San Francisco Brain Metastases Stereotactic Radiosurgery datasets, respectively. Preprocessing included brain extraction, bias correction, coregistration, intensity normalization, and semi-manual binary tumor segmentation. Two-thousand five hundred and twenty-eight radiomic features were extracted from T1w (± contrast), fluid-attenuated inversion recovery (FLAIR), and wavelet transforms for each sequence (8 decompositions). Random forest classifiers were trained with selected features on original and oversampled data (5-fold cross-validation) and evaluated on internal/external holdout test sets using accuracy, precision, recall, F1 score, and area under the receiver-operating characteristic curve (AUC).
Oversampling did not improve the overall unsatisfactory performance on the internal and external test sets. Incorrect data partitioning (oversampling before train/validation/test split) leads to a massive overestimation of model performance.
Radiomics models' capability to predict histologic or genomic data from imaging should be critically assessed; external validation is essential.
越来越多的研究表明,利用从影像数据中提取的放射组学特征能够预测各种恶性肿瘤的组织学或基因信息。本研究旨在通过内部和外部验证,利用过采样技术解决类别不平衡问题,探讨基于磁共振成像(MRI)的放射组学在预测脑转移瘤原发肿瘤方面的应用。
这项经机构审查委员会(IRB)批准的回顾性多中心研究纳入了来自肺癌、黑色素瘤、乳腺癌、结直肠癌以及其他原发实体的混合异质性组(五类分类)的脑转移瘤。本地数据于2003年至2021年间从231例患者(545个转移灶)中获取。分别使用公开可用的斯坦福脑转移共享数据集和加利福尼亚大学旧金山分校脑转移瘤立体定向放射外科数据集对82例患者(280个转移灶)和258例患者(809个转移灶)进行了外部验证。预处理包括脑提取、偏差校正、配准、强度归一化以及半自动二元肿瘤分割。从每个序列的T1加权(±对比)、液体衰减反转恢复(FLAIR)和小波变换(8种分解)中提取2528个放射组学特征。使用随机森林分类器在原始数据和过采样数据上训练选定特征(五折交叉验证),并使用准确率、精确率、召回率、F1分数和受试者操作特征曲线下面积(AUC)在内部/外部保留测试集上进行评估。
过采样并未改善内部和外部测试集上总体不尽人意的表现。错误的数据划分(在训练/验证/测试分割之前进行过采样)会导致对模型性能的严重高估。
应严格评估放射组学模型从影像预测组织学或基因组数据的能力;外部验证至关重要。