Université Paris-Saclay, Institut Gustave Roussy, Inserm, Radiothérapie Moléculaire et Innovation Thérapeutique, 94800, Villejuif, France.
Institut Gustave Roussy, Département de Radiothérapie, 94800, Villejuif, France.
Sci Rep. 2022 Jul 26;12(1):12762. doi: 10.1038/s41598-022-16609-1.
The use of multicentric data is becoming essential for developing generalizable radiomic signatures. In particular, Magnetic Resonance Imaging (MRI) data used in brain oncology are often heterogeneous in terms of scanners and acquisitions, which significantly impact quantitative radiomic features. Various methods have been proposed to decrease dependency, including methods acting directly on MR images, i.e., based on the application of several preprocessing steps before feature extraction or the ComBat method, which harmonizes radiomic features themselves. The ComBat method used for radiomics may be misleading and presents some limitations, such as the need to know the labels associated with the "batch effect". In addition, a statistically representative sample is required and the applicability of a signature whose batch label is not present in the train set is not possible. This work aimed to compare a priori and a posteriori radiomic harmonization methods and propose a code adaptation to be machine learning compatible. Furthermore, we have developed AutoComBat, which aims to automatically determine the batch labels, using either MRI metadata or quality metrics as inputs of the proposed constrained clustering. A heterogeneous dataset consisting of high and low-grade gliomas coming from eight different centers was considered. The different methods were compared based on their ability to decrease relative standard deviation of radiomic features extracted from white matter and on their performance on a classification task using different machine learning models. ComBat and AutoComBat using image-derived quality metrics as inputs for batch assignment and preprocessing methods presented promising results on white matter harmonization, but with no clear consensus for all MR images. Preprocessing showed the best results on the T1w-gd images for the grading task. For T2w-flair, AutoComBat, using either metadata plus quality metrics or metadata alone as inputs, performs better than the conventional ComBat, highlighting its potential for data harmonization. Our results are MRI weighting, feature class and task dependent and require further investigations on other datasets.
多中心数据的使用对于开发可推广的放射组学特征变得至关重要。特别是,脑肿瘤磁共振成像(MRI)数据在扫描仪和采集方面往往存在异质性,这会显著影响定量放射组学特征。已经提出了各种方法来降低依赖性,包括直接作用于 MR 图像的方法,例如,在特征提取之前应用几个预处理步骤,或 ComBat 方法,它协调放射组学特征本身。用于放射组学的 ComBat 方法可能具有误导性,并存在一些限制,例如需要知道与“批次效应”相关的标签。此外,需要具有统计学代表性的样本,并且不可能应用其批次标签不在训练集中的签名。这项工作旨在比较先验和后验放射组学协调方法,并提出一种代码适应以适应机器学习。此外,我们开发了 AutoComBat,旨在使用 MRI 元数据或质量指标作为提出的约束聚类的输入,自动确定批次标签。考虑了来自八个不同中心的高级和低级胶质瘤的异质数据集。根据它们降低从白质提取的放射组学特征的相对标准偏差的能力以及使用不同机器学习模型进行分类任务的性能,比较了不同的方法。ComBat 和 AutoComBat 使用图像衍生的质量指标作为输入用于批次分配和预处理方法,在白质协调方面显示出有希望的结果,但对于所有 MRI 图像都没有明确的共识。预处理在分级任务的 T1w-gd 图像上显示出最佳结果。对于 T2w-flair,AutoComBat,使用元数据加质量指标或仅元数据作为输入,比传统的 ComBat 表现更好,突出了其用于数据协调的潜力。我们的结果与 MRI 加权、特征类别和任务相关,需要在其他数据集上进行进一步研究。